Distributed training using multiple GPUs with tensorflow.slim.learning - tensorflow

I understand that TensorFlow supports distributed training.
I find num_clones in train_image_classifier.py so that I can use multiple GPUs locally.
python $TF_MODEL_HOME/slim/train_image_classifier.py \
--num_clones=2
--train_dir=${TRAIN_DIR} \
--dataset_name=imagenet \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=vgg_19 \
--batch_size=32 \
--max_number_of_steps=100
How do I use multiple GPUs on different hosts?

You need to use --worker_replicas=<no of hosts> to train on multiple hosts with same number of GPUs. Apart from that, you have to configure --task, --num_ps_tasks, --sync_replicas, --replicas_to_aggregate if you are training on multiple hosts.
I'd suggest you give Horovod a try. I'm planning to give it a try in a couple of days.

Related

How to run TensorFlow 2 in a distributed environment with Horovod?

I have successfully set up the distributed environment and run the example with Horovod. And I also know that if I want to run the benchmark on TensorFlow 1 in a distributed setup, e.g. 4 nodes, following the tutorial, the submission should be:
$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 \
python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
--model resnet101 \
--batch_size 64 \
--variable_update horovod \
--data_dir /path/to/imagenet/tfrecords \
--data_name imagenet \
--num_batches=2000
But now I want to run the TensorFlow 2 official models, for example BERT model. What command should I use?

Convert Frozen graph for tfLite for Coral using tflite_convert

I'm using MobileNetV2 and trying to get it working for Google Coral. Everything seems to work except the Coral Web Compiler, throws a random error, Uncaught application failure. So I think the problem is the intemidary steps required. For example, I'm using this with tflite_convert
tflite_convert \
--graph_def_file=optimized_graph.pb \
--output_format=TFLITE \
--output_file=mobilenet_v2_new.tflite \
--inference_type=FLOAT \
--inference_input_type=FLOAT \
--input_arrays=input \
--output_arrays=final_result \
--input_shapes=1,224,224,3
What am I getting wrong?
This is most likely because your model is not quantized. Edge TPU devices do not currently support float-based model inference. For the best results, you should enable quantization during training (described in the link). However, you can also apply quantization during TensorFlow Lite conversion.
With post-training quantization, you sacrifice accuracy but can test something out more quickly. When you convert your graph to TensorFlow Lite format, set inference_type to QUANTIZED_UINT8. You'll also need to apply the quantization parameters (mean/range/std_dev) on the command line as well.
tflite_convert \
--graph_def_file=optimized_graph.pb \
--output_format=TFLITE \
--output_file=mobilenet_v2_new.tflite \
--inference_type=QUANTIZED_UINT8 \
--input_arrays=input \
--output_arrays=final_result \
--input_shapes=1,224,224,3 \
--mean_values=128 --std_dev_values=127 \
--default_ranges_min=0 --default_ranges_max=255
You can then pass the quantized .tflite file to the model compiler.
For more details on the Edge TPU model requirements, check out TensorFlow models on the Edge TPU.

Simple cluster manager for small Tensorflow distributed training?

I'm just getting into distributed training with Tensorflow. At the moment, I run 4 processes on the same computer on different ports:
python trainer.py \
--model models/my_model
--model_dir model_dir/my_model \
--train_set data/train.csv \
--val_set data/val.csv \
--cluster_spec '{ \
"environment":"cloud" \
"cluster":{ \
"chief": ["localhost:2221"], \
"worker":["localhost:2222"], \
"ps":["localhost:2220"] \
}, \
"task":{ \
"type":"chief", \
"index":0 \
}, \
}'
The only thing that changes for each process is the end of the --cluster_spec flag where the values for task are specific for each role for each process.
Now I'm thinking about using the three computers I have at home instead of using just different processes within the same machine.
Question
Other than Kubernetes, what cluster management software could I use to simplify launching and watching those four processes across three different computers connected via WiFi? Ideally, this would be something very approachable for someone who's never done automated cluster management before.

Google Cloud ML: Use Nightly TF Import Error No Module tensorflow

I want to train the NMT model from Google on Google Cloud ML.
NMT Model
Now I put all input data in a bucket and downloaded the git repository.
The model needs the nightly version of tensorflow so I defined it in setup.py and when I use the cpu version tf-nightly==1.5.0-dev20171115 and run the following command to run it in GCP local it works.
Train local on google:
gcloud ml-engine local train --package-path nmt/ \
--module-name nmt.nmt \
-- --src=en --tgt=de \
--hparams_path=$HPARAMAS_PATH \
--out_dir=$OUTPUT_DIR \
--vocab_prefix=$VOCAB_PREFIX \
--train_prefix=$TRAIN_PREFIX \
--dev_prefix=$DEV_PREFIX \
--test_prefix=$TEST_PREFIX
Now when I use the gpu version with the following command I got this error message few minutes after submitting the job.
Train on cloud
gcloud ml-engine jobs submit training $JOB_NAME \
--runtime-version 1.2 \
--job-dir $JOB_DIR \
--package-path nmt/ \
--module-name nmt.nmt \
--scale-tier BAISC_GPU \
--region $REGION \
-- --src=en --tgt=de \
--hparams_path=$HPARAMAS_PATH \
--out_dir=$OUTPUT_DIR \
--vocab_prefix=$VOCAB_PREFIX \
--train_prefix=$TRAIN_PREFIX \
--dev_prefix=$DEV_PREFIX \
--test_prefix=$TEST_PREFIX
Error:
import tensorflow as tf ImportError: No module named tensorflow
setup.py:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['tf-nightly-gpu==1.5.0-dev20171115']
setup(
name="nmt",
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
version='0.1.2'
)
Thank you all in advance
Markus
Update:
I have found a note on
GCP docs
Note: Training with TensorFlow versions 1.3+ is limited to CPUs only. See the Cloud ML Engine release notes for updates.
So it seems to doesn't work currently I think I have to go with the compute engine.
Or is there any hack to got it working?
However thank you for your help
The TensorFlow 1.5 might need newer version of CUDA (i.e., CUDA 9), and but the version CloudML Engine installed is CUDA 8. Can you please try to use TensorFlow 1.4 instead, which works on CUDA 8? Please tell us if 1.4 works for you here or send us an email via cloudml-feedback#google.com

Keras on Google Cloud ML does not seem to use GPU? Is it possible to make it work?

I tried running Keras with tensorflow backend on cloud ml (google cloud platform). I find that keras does not seem to use the GPU. The performance for running one epoch on my CPU is 190 seconds and is equal to what I see in the logs dumped. Is there a way to identify whether a code is running in GPU or CPU in keras? Has anybody tried Keras on Cloud ML with Tensor flow backend running??
Update: As of March of 2017, GPUs are publicly available. See Fuyang Liu's answer
GPUs are not currently available on CloudML. However, they will be in the upcoming months.
yes it is supported now.
Basically you need to add a file such as cloudml-gpu.yaml in your module with the following content:
trainingInput:
scaleTier: CUSTOM
# standard_gpu provides 1 GPU. Change to complex_model_m_gpu for 4
GPUs
masterType: standard_gpu
runtimeVersion: "1.0"
Then add a option called --config=trainer/cloudml-gpu.yaml (suppose your training module is in a folder called trainer). For example:
export BUCKET_NAME=tf-learn-simple-sentiment
export JOB_NAME="example_5_train_$(date +%Y%m%d_%H%M%S)"
export JOB_DIR=gs://$BUCKET_NAME/$JOB_NAME
export REGION=europe-west1
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir gs://$BUCKET_NAME/$JOB_NAME \
--runtime-version 1.0 \
--module-name trainer.example5-keras \
--package-path ./trainer \
--region $REGION \
--config=trainer/cloudml-gpu.yaml \
-- \
--train-file gs://tf-learn-simple-sentiment/sentiment_set.pickle
You may also want to checkout this url for the GPU available region and other info on it.
import keras.backend.tensorflow_backend as K
K._set_session(K.tf.Session(config=K.tf.ConfigProto(log_device_placement=True)))
should make keras print the device placement of each tensor to stdout or stderr.