Google colab brings TPUs in the Runtime Accelerator. I found an example, How to use TPU in Official Tensorflow github. But the example not worked on google-colaboratory. It stuck on following line:
tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy)
When I print available devices on colab it return [] for TPU accelerator. Does anyone knows how to use TPU on colab?
Here's a Colab-specific TPU example:
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpu_and_keras.ipynb
The key lines are those to connect to the TPU itself:
# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
...
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
training_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
(Unlike a GPU, use of the TPU requires an explicit connection to the TPU worker. So, you'll need to tweak your training and inference definition in order to observe a speedup.)
Related
I am currently trying to train a model for my bachelor's.
The train ETA though is very huge so I have considered using TPUs. However, everytime I try to train with a tpu strategy following this google's notebook I keep getting the following error:
(0) INTERNAL: {{function_node __inference_train_function_7167}} failed to connect to all addresses
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:{"created":"#1651692210.674048314","description":"Failed to pick subchannel","file":"third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3124,"referenced_errors":[{"created":"#1651692210.674047476","description":"failed to connect to all addresses","file":"third_party/grpc/src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
Executing non-communication op <MultiDeviceIteratorGetNextFromShard> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.
[[RemoteCall]]
[[IteratorGetNextAsOptional]]
[[strided_slice_69/_310]]
Error as shown in Colab
you can check my TPU boilerplate code here:
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
My dataset is stored in my google's drive as images.
I am trying to train using tf.keras.model.fit
Since I have a large dataset and not much power in my PC, I thought it was a good idea to use TPU on Google Colab.
So, here is my TPU configuration :
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
print('Running on TPU ', tpu.master())
except ValueError:
tpu = None
if tpu:
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
strategy = tf.distribute.get_strategy()
print("REPLICAS: ", strategy.num_replicas_in_sync)
And here is my training :
hist = model.fit(train_dataset, epochs=10, verbose=1, steps_per_epoch=count_data_items(filenames)//64)
It is not enough to create a strategy. You should use this strategy correctly.
You probably have to tune your pipeline, increase batch size, etc.
Have a look here: https://cloud.google.com/tpu/docs/performance-guide
Another important point is that TPU has a warm-up period — it spends a lot of time building a computation graph during the first calls (every call with a new input shape).
The number of TPU core available for the Colab notebooks is 8 currently. Takeaways: From observing the training time, it can be seen that the TPU takes considerably more training time than the GPU when the batch size is small. But when batch size increases the TPU performance is comparable to that of the GPU.go through this link for more details
I'm training an object detection model by following the guide here https://towardsdatascience.com/creating-your-own-object-detector-ad69dda69c85
On Google Colab I am able to execute the following and it makes use of the GPU
python train.py --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync.config
I would now like train by using the TPU but this obviously does not just work out of the box. Running train.py is slow and appears to be using CPU only. How can I achieve this?
While using TPU in Google Colab, we should use the below mentioned code tocheck that the TPU devices are properly recognized in the environment:
import os
import pprint
import tensorflow as tf
if 'COLAB_TPU_ADDR' not in os.environ:
print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
else:
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print ('TPU address is', tpu_address)
with tf.Session(tpu_address) as session:
devices = session.list_devices()
print('TPU devices:')
pprint.pprint(devices)
This should output a list of 8 TPU devices available in our Colab environment.
In order to run the tf.keras model on a TPU, we have to convert it to a TPU-model using the tf.contrib.tpu.keras_to_tpu module.
It can be done using the below code:
# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
tf.logging.set_verbosity(tf.logging.INFO)
resnet_model = tf.contrib.tpu.keras_to_tpu_model(
resnet_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
Fore more information, please refer this Medium Link and this Link.
I wanted to use the tf.contrib.distribute.MirroredStrategy() on my Multi GPU System but it doesn't use the GPUs for the training (see the output below). Also I am running tensorflow-gpu 1.12.
I did try to specify the GPUs directly in the MirroredStrategy, but the same problem appeared.
model = models.Model(inputs=input, outputs=y_output)
optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
model.compile(loss=lossFunc, optimizer=optimizer)
NUM_GPUS = 2
strategy = tf.contrib.distribute.MirroredStrategy(num_gpus=NUM_GPUS)
config = tf.estimator.RunConfig(train_distribute=strategy)
estimator = tf.keras.estimator.model_to_estimator(model,
config=config)
These are the results I am getting:
INFO:tensorflow:Device is available but not used by distribute strategy: /device:CPU:0
INFO:tensorflow:Device is available but not used by distribute strategy: /device:GPU:0
INFO:tensorflow:Device is available but not used by distribute strategy: /device:GPU:1
WARNING:tensorflow:Not all devices in DistributionStrategy are visible to TensorFlow session.
The expected result would be obviously to run the training on a Multi GPU system. Are those known issues?
I've been facing a similar issue with MirroredStrategy failing on tensorflow 1.13.1 with 2x RTX2080 running an Estimator.
The failure seems to be in the NCCL all_reduce method (error message - no OpKernel registered for NCCL AllReduce).
I got it to run by changing from NCCL to hierarchical_copy, which meant using the contrib cross_device_ops methods as follows:
Failed command:
mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0","/gpu:1"])
Successful command:
mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0","/gpu:1"],
cross_device_ops=tf.contrib.distribute.AllReduceCrossDeviceOps(
all_reduce_alg="hierarchical_copy")
)
In TensorFlow new version, AllReduceCrossDeviceOps isn't exist. You may use distribute.HierarchicalCopyAllReduce() instead:
mirrored_strategy = tf.distribute.MirroredStrategy(devices= ["/gpu:0","/gpu:1"],cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
Maybe my question is a bit naive, but I really didn't find anything in the tensorflow documentation.
I have a trained tensorflow model where the variables of it was placed in the GPU. Now I would like to restore this model and test it using the CPU.
If I do this via 'tf.train.Saver.restore` as in the example:
saver = tf.train.import_meta_graph("/tmp/graph.meta")
saver.restore(session, "/tmp/model.ckp")
I have the following excpetion:
InvalidArgumentError: Cannot assign a device to node 'b_fc8/b_fc8/Adam_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
How can I make restore these variables in the CPU?
Thanks
Use clear_devices flag, ie
saver = tf.train.import_meta_graph("/tmp/graph.meta", clear_devices=True)
I'm using tensorflow 0.12 and clear_devices=True and tf.device('/cpu:0') was not working with me (saver.restore was still trying to assign variables to /gpu:0).
I really needed to force everything to /cpu:0 since I was loading several models which wouldn't fit in GPU memory anyways. Here are two alternatives to force everything to /cpu:0
Set os.environ['CUDA_VISIBLE_DEVICES']=''
Use the device_count of ConfigProto like tf.Session(config=tf.ConfigProto(device_count={"GPU": 0, "CPU": 1}))