Tensorflow contrib.learn.Estimator multi-GPU - tensorflow

In order to use the contrib.learn.Estimator for multi-GPU training, I am attempting to specify GPU assignments in my model_fn.
In pseudo-code:
def model_fn(X, y):
with tf.device('/gpu:1'):
... various tensorflow ops for model ...
return predictions, loss, train_op
Everything works fine without the tf.device('/gpu:1') call, but with it I encounter the following error:
InvalidArgumentError (see above for traceback): Cannot assign a device to
node 'save/ShardedFilename_1': Could not satisfy explicit device
specification '/device:GPU:1' because no supported kernel
for GPU devices is available.
I do not believe that I am adding the offending op to the graph myself, but rather that it is injected through the Estimator's snapshot functionality.
I believe that the solution is to set allow_soft_placement=True so that non GPU functions will fall to CPU, but it's not obvious to me how that exposed when dealing with contrib.learn.Estimator.
I see that the option is usually set in ConfigProto & passed to the session, but I've been using the Estimator's functionality to manage the session for me. Should I be taking control of the session creation, or am I missing a parameter somewhere to accomplish this?
Many thanks in advance for any advice.

Along with Estimator leaving contrib in Tensorflow 1.0 this is fixed.

Related

'use_multiprocessing=True' in Mask RCNN with Keras 2.x & Tensorflow 2.x

I am using Keras 2.9.0 and Tensorflow 2.9.2 and already managed to make the necessary changes to compile the Mask-RCNN model (there are many compatability issues as it is a 2017 model).
The code is running on Colab with GPU.
I get now the following warning:
WARNING:tensorflow:Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the `keras.utils.Sequence` class.
This comes from the following lines in the model.py file:
self.keras_model.fit(
train_generator,
initial_epoch=self.epoch,
epochs=epochs,
steps_per_epoch=self.config.STEPS_PER_EPOCH,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=self.config.VALIDATION_STEPS,
max_queue_size=100,
workers=workers,
use_multiprocessing=True,
)
The problem is that the training is stuck at Epoch 1/XXX before even starting to really train.
I am pretty sure this is due tot he multiprocessing warning (from other threads here).
this thread is referring here but its a very different approach to generating the data than in Mask RCNN and therefore I'd like to avoid making such a big change (potentially will create many other issues).
Moreover, if I set use_multiprocessing=False (default) there is the following error:
RuntimeError: Your generator is NOT thread-safe.Keras requires a thread-safe generator when use_multiprocessing=False, workers > 1
as far as I understand, the solutions suggested here are not directly the mask-rcnn model.
Question: is there a way to resolve the issue with Mask-RCNN? preferably keep the option to run with multiprocessing (to be faster) ?
EDIT:
even if I reduce the original amount of workers (12) to 1 (as hinted in the warning message), the model is still stuck at the same stage.

create_training_graph() failed when converted MobileFacenet to quantize-aware model with TF-lite

I am trying to quantize MobileFacenet (code from sirius-ai) according to the suggestion
and I think I met the same issue as this one
When I add tf.contrib.quantize.create_training_graph() into training graph
(train_nets.py ln.187: before train_op = train(...) or in train() utils/common.py ln.38 before gradients)
It did not add quantize-aware ops into the graph to collect dynamic range max\min.
I assume that I should see some additional nodes in tensorboard, but I did not, thus I think I did not successfully add quantize-aware ops in training graph.
And I try to trace tensorflow, found that I got nothing with _FindLayersToQuantize().
However when I add tf.contrib.quantize.create_eval_graph() to refine the training graph. I can see some quantize-aware ops as act_quant...
Since I did not add ops in training graph successfully, I have no weights to load in eval graph.
Thus I got some error message as
Key MobileFaceNet/Logits/LinearConv1x1/act_quant/max not found in checkpoint
or
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobileFaceNet/Logits/LinearConv1x1/act_quant/max
Does anyone know how to fix this error? or how to get quantized MobileFacenet with good accuracy?
Thanks!
H,
Unfortunately, the contrib/quantize tool is now deprecated. It won't be able to support newer models, and we are not working on it anymore.
If you are interested in QAT, I would recommend trying the new TF/Keras QAT API. We are actively developing that and providing support for it.

Running Tensorflow model inference script on multiple GPU

I'm trying to run the model scoring (inference graph) from tensorflow objec detection API to run it on multiple GPU's, tried specifying the GPU number in the main, but it runs only on single GPU.placed GPU utilization snapshot here
Using tensorflow-gpu==1.13.1, can you kindly point me what I'm missing here.
for i in range(2):
with tf.device('/gpu:{}' . format(i)):
tf_init()
init = tf.global_variables_initializer
with detection_graph.as_default():
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
call to #run_inference_multiple_images function
The responses to this question should give you a few options for fixing this.
Usually TensorFlow will occupy all visible GPUs unless told otherwise. So if you haven't already tried, you could just remove the with tf.device line (assuming you only have the two GPUs) and TensorFlow should use them both.
Otherwise, I think the easiest is setting the environment variables with os.environ["CUDA_VISIBLE_DEVICES"] = "0,1".

How does Tensorflow support using optimizers with custom ops?

I've made a new op and I'd like to use it with AdamOptimizer. I've created a gradient for it following the instructions here and added it to my optimizer's var_list but Tensorflow says that my variable doesn't have a processor.
Is there support for Tensorflow custom ops in optimizers?
Does the optimizer class let me create a new processor or would I have to rewrite part of compute_gradients?
Also, what does automatic differentiation mean, as stated by the TF docs:
To make automatic differentiation work for new ops, you must register a gradient function which computes gradients with respect to the ops' inputs given gradients with respect to the ops' outputs.
Thanks!
So I found out that what I was doing was not supported with Tensorflow optimizer.
I was trying to create an op that would act like a Tensorflow variable (i.e. get updated by the functions within Optimizer::minimize()), however, I believe that TF does something weird with processors and Eigen::Tensors that I don't fully understand in order to update gradients with minimize(), and naturally this doesn't work with Op classes.

Tensorflow, restore variables in a specific device

Maybe my question is a bit naive, but I really didn't find anything in the tensorflow documentation.
I have a trained tensorflow model where the variables of it was placed in the GPU. Now I would like to restore this model and test it using the CPU.
If I do this via 'tf.train.Saver.restore` as in the example:
saver = tf.train.import_meta_graph("/tmp/graph.meta")
saver.restore(session, "/tmp/model.ckp")
I have the following excpetion:
InvalidArgumentError: Cannot assign a device to node 'b_fc8/b_fc8/Adam_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
How can I make restore these variables in the CPU?
Thanks
Use clear_devices flag, ie
saver = tf.train.import_meta_graph("/tmp/graph.meta", clear_devices=True)
I'm using tensorflow 0.12 and clear_devices=True and tf.device('/cpu:0') was not working with me (saver.restore was still trying to assign variables to /gpu:0).
I really needed to force everything to /cpu:0 since I was loading several models which wouldn't fit in GPU memory anyways. Here are two alternatives to force everything to /cpu:0
Set os.environ['CUDA_VISIBLE_DEVICES']=''
Use the device_count of ConfigProto like tf.Session(config=tf.ConfigProto(device_count={"GPU": 0, "CPU": 1}))