'use_multiprocessing=True' in Mask RCNN with Keras 2.x & Tensorflow 2.x

I am using Keras 2.9.0 and Tensorflow 2.9.2 and already managed to make the necessary changes to compile the Mask-RCNN model (there are many compatability issues as it is a 2017 model).
The code is running on Colab with GPU.
I get now the following warning:
WARNING:tensorflow:Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the `keras.utils.Sequence` class.
This comes from the following lines in the model.py file:
The problem is that the training is stuck at Epoch 1/XXX before even starting to really train.
I am pretty sure this is due tot he multiprocessing warning (from other threads here).
this thread is referring here but its a very different approach to generating the data than in Mask RCNN and therefore I'd like to avoid making such a big change (potentially will create many other issues).
Moreover, if I set use_multiprocessing=False (default) there is the following error:
RuntimeError: Your generator is NOT thread-safe.Keras requires a thread-safe generator when use_multiprocessing=False, workers > 1
as far as I understand, the solutions suggested here are not directly the mask-rcnn model.
Question: is there a way to resolve the issue with Mask-RCNN? preferably keep the option to run with multiprocessing (to be faster) ?
even if I reduce the original amount of workers (12) to 1 (as hinted in the warning message), the model is still stuck at the same stage.


DIfferent optimization with different TF versions

I'm trying to train a convolutional neural network with keras and Tensorflow version 2.6, also I did it with Tensorflow version 1.11. I think that I did the migration okey (two neural networks converged) but when I see the results they are very different, worst in TF2.6, I used an optimizer Adam for both cases with the same hyperparameters (learning_rate = 0.001) but the optimization in the loss function in TF1.11 is better than in TF2.6
I'm trying to find out where the differences could be. What things must be taken into account when we work with differents TF versions? Can have important numerical differences? I know that in TF1.x the default mode is graph and in TF2 the default is eager, I don't know if this could bring different behavior in the training.
It surprises me how much the loss function is reduced in the first epochs reaching a lower value at the end of the training.
you understand that is correct they are working in different working modes eager and graph but the loss Fn is defined by how much change of value to required optimized pointed calculated by your or configured method.
You cannot directly be compared one model training history to another directly, running it several time you experience TF 1 is faster and smaller in the number of losses in the loss Fn that is needed to review the changelog Changlog
Loss Fn are updated, the graph is the powerful technique we know but TF 2.x supports access of the value at its level, why you have easy delegated methods such as callback, dynamic FNs, and working update value runtime. ( Trends to understand and experiments for student or user compared by both versions on the same tasks )
Symetrics in methods not create different results.

Set batch size of trained keras model to 1

I am having a keras model trained on my own dataset. However after loading weights the summary shows None as the first dimension(the batch size).
I want to know the process to fix the shape to batch size of 1, as it is compulsory for me to fix it so i can convert the model to tflite with GPU support.
What worked for me was to specify batch size to the Input layer, like this:
input = layers.Input(shape=input_shape, batch_size=1, dtype='float32', name='images')
This then carried through the rest of the layers.
The bad news is that despite this "fix" the tfl runtime still complains about dynamic tensors. I get these non-fatal errors in logcat when it runs:
E/tflite: third_party/tensorflow/lite/core/subgraph.cc:801 tensor.data.raw != nullptr was not true.
E/tflite: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#26 is a dynamic-sized tensor).
E/tflite: Ignoring failed application of the default TensorFlow Lite delegate indexed at 0.
The good news is that despite these errors it seems to be using the GPU anyway, based on performance testing.
I'm using:
Hopefully, they'll fix the runtime so it doesn't matter whether the batch size is 'None'. It shouldn't matter for doing inference.

create_training_graph() failed when converted MobileFacenet to quantize-aware model with TF-lite

I am trying to quantize MobileFacenet (code from sirius-ai) according to the suggestion
and I think I met the same issue as this one
When I add tf.contrib.quantize.create_training_graph() into training graph
(train_nets.py ln.187: before train_op = train(...) or in train() utils/common.py ln.38 before gradients)
It did not add quantize-aware ops into the graph to collect dynamic range max\min.
I assume that I should see some additional nodes in tensorboard, but I did not, thus I think I did not successfully add quantize-aware ops in training graph.
And I try to trace tensorflow, found that I got nothing with _FindLayersToQuantize().
However when I add tf.contrib.quantize.create_eval_graph() to refine the training graph. I can see some quantize-aware ops as act_quant...
Since I did not add ops in training graph successfully, I have no weights to load in eval graph.
Thus I got some error message as
Key MobileFaceNet/Logits/LinearConv1x1/act_quant/max not found in checkpoint
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobileFaceNet/Logits/LinearConv1x1/act_quant/max
Does anyone know how to fix this error? or how to get quantized MobileFacenet with good accuracy?
Unfortunately, the contrib/quantize tool is now deprecated. It won't be able to support newer models, and we are not working on it anymore.
If you are interested in QAT, I would recommend trying the new TF/Keras QAT API. We are actively developing that and providing support for it.

Running Tensorflow model inference script on multiple GPU

I'm trying to run the model scoring (inference graph) from tensorflow objec detection API to run it on multiple GPU's, tried specifying the GPU number in the main, but it runs only on single GPU.placed GPU utilization snapshot here
Using tensorflow-gpu==1.13.1, can you kindly point me what I'm missing here.
for i in range(2):
with tf.device('/gpu:{}' . format(i)):
init = tf.global_variables_initializer
with detection_graph.as_default():
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
call to #run_inference_multiple_images function
The responses to this question should give you a few options for fixing this.
Usually TensorFlow will occupy all visible GPUs unless told otherwise. So if you haven't already tried, you could just remove the with tf.device line (assuming you only have the two GPUs) and TensorFlow should use them both.
Otherwise, I think the easiest is setting the environment variables with os.environ["CUDA_VISIBLE_DEVICES"] = "0,1".

Tensorflow contrib.learn.Estimator multi-GPU

In order to use the contrib.learn.Estimator for multi-GPU training, I am attempting to specify GPU assignments in my model_fn.
In pseudo-code:
def model_fn(X, y):
with tf.device('/gpu:1'):
... various tensorflow ops for model ...
return predictions, loss, train_op
Everything works fine without the tf.device('/gpu:1') call, but with it I encounter the following error:
InvalidArgumentError (see above for traceback): Cannot assign a device to
node 'save/ShardedFilename_1': Could not satisfy explicit device
specification '/device:GPU:1' because no supported kernel
for GPU devices is available.
I do not believe that I am adding the offending op to the graph myself, but rather that it is injected through the Estimator's snapshot functionality.
I believe that the solution is to set allow_soft_placement=True so that non GPU functions will fall to CPU, but it's not obvious to me how that exposed when dealing with contrib.learn.Estimator.
I see that the option is usually set in ConfigProto & passed to the session, but I've been using the Estimator's functionality to manage the session for me. Should I be taking control of the session creation, or am I missing a parameter somewhere to accomplish this?
Many thanks in advance for any advice.
Along with Estimator leaving contrib in Tensorflow 1.0 this is fixed.