How to make predictions with Mask R-CNN and python 3.10 - tensorflow2.0

My problem:
I have weights for a Mask R-CNN Model, which has been trained using python 3.7 and tensorflow 1.13.1. I can use this environment to make predictions.
I am able to reproduce those predictions using python 3.8 and loading the weights with the Mask R-CNN for tensorflow 2 and tensorflow 2.4.1.
When I use python 3.10 and tensorflow 2.9.1 the prediction process runs without any errors but the results do not make any sense. The class instances are just a view randomly distributed specks. The results look similar for python 3.8 and tensorflow 2.9.1.
Where I'm at
I need to use python 3.10, I don't care about the tensorflow version. I found requirements for an environment that should work for python 3.9 using tensorflow 2.7. But to my understanding for python 3.10 I need tensorflow 2.8 or higher.
What I need
I have no experience with tensorflow or Mask R-CNN so I don't really know where to start. Did someone already encounter this kind of problem, is it something typical and does it point into a certain direction?

Related

How to perform quantize aware training with tensorflow 1.15?

I am using tensorflow 1.15 due to dependency on multiple other modules and struggling to do quantize aware training. I came across tensorflow_model_optimization, but it works with tensorflow 2.x. Is there any way Quantization can be performed during training with tensorflow 1.15?
We cannot as quantization aware training was introduced in TF 2.0, so please upgrade to 2.x and let us know if you face any issues. as there is no workaround for it in 1.x.

What are the main differences between TensorFlowLite, TendorFlow-TRT and TensorRT?

I am using the Coral devboard and the Nvidia Jetson TX2. And that is how I got to know about TensorFlow-Lite, TensorFlow-TRT and TensorRT.
I have some questions about them:
Between TensorFlow-TRT and TensorRT:
When using a fully optimised/compatible graph with TensorRT, which one is faster and why?
The pipeline to use TFlite in a Google Coral (When using TensorFlow 1.x...) is:
a. Use a model available in TensorFlow's zoo
b. Convert the model to frozen (.pb)
c. Use protobuff to serialize the graph
d. Convert to Tflite
e. Apply quantization (INT8)
f. Compile
what would be the pipeline when using TensorFlow-TRT and TensorRT?
Is there somewhere where I can find a good documentation about it?
So far I think TensorRT is closer to TensorFlow Lite because:
TFlite: after compilation you end up with a .quant.edtpu.tflite file which can be used to make inference in the devboard
TensorRT: you will end up with a .plan file to make inference in the devboard.
Thank you for the answers, and if you can point me to documentation which compares them, that will be appreciated.
TensorRT is a very fast CUDA runtime for GPU only. I am using an Nvidia Jetson Xavier NX with Tensorflow models converted to TensorRT, running on the Tensorflow-RT (TRT) runtime. The benefit of TRT runtime is any unsupported operations on TensorRT will fall back to using Tensorflow.
Have not tried Tensorflow-Lite, but I understand it as a reduced TF for inference-only on "small devices". It can support GPU but only limited operations and I think there are no python bindings (currently).

GPU support for TensorFlow & PyTorch

Okay, so I've worked on a bunch of Deep Learning projects and internships now and I've never had to do heavy training. But lately I've been thinking of doing some Transfer Learning for which I'll need to run my code on a GPU. Now I have a system with Windows 10 and a dedicated NVIDIA GeForce 940M GPU. I've been doing a lot of research online, but I'm still a bit confused. I haven't installed the NVIDIA Cuda Toolkit or cuDNN or tensorflow-gpu on my system yet. I currently use tensorflow and pytorch to train my DL models. Here are my queries -
When I define a tensor in tf or pytorch, it is a cpu tensor by default. So, all the training I've been doing so far has been on the CPU. So, if I make sure to install the correct versions of Cuda and cuDNN and tensorflow-gpu (specifically for tensorflow), I can run my models on my GPU using tf-gpu and pytorch and that's it? (I'm aware of the torch.cuda.is_available() in pytorch to ensure pytorch can access my GPU and the device_lib module in tf to check if my gpu is visible to tensorflow)(I'm also aware of the fact that tf doesnt support all Nvidia GPUs)
Why does tf have a separate module for GPU support? PyTorch doesnt seem to have that and all you need to do is cast your tensor from cpu() to cuda() to switch between them.
Why install cuDNN? I know it is a high-level API CUDA built for support to train Deep Neural Nets on the GPU. But do tf-gpu and torch use these in the backend while training on the gpu?
After tf == 1.15, did they combine CPU and GPU support all into one package?
First of all unfortunately 940M is a kinda weak GPU for training. I suggest you use Google colab for faster training but of course, it would be faster than the CPU. So here my answers to your four questions.
1-) Yes if you install the requirements correctly, then you can run on GPU. You can manually place your data to your GPU as well. You can check implementations on TensorFlow. In PyTorch, you should specify the device that you want to use. As you said you should do device = torch.device("cuda" if args.cuda else "cpu") then for models and data you should always call .to(device) Then it will automatically use GPU if available.
2-) PyTorch also needs extra installation (module) for GPU support. However, with recent updates both TF and PyTorch are easy to use for GPU compatible code.
3-) Both Tensorflow and PyTorch is based on cuDNN. You can use them without cuDNN but as far as I know, it hurts the performance but I'm not sure about this topic.
4-) No they are still different packages. tensorflow-gpu==1.15 and tensorflow==1.15 what they did with tf2, was making the tensorflow more like Keras. So it is more simplified then 1.15 or before.
Rest was already answered by regarding 3) cudNN optimizes layer and such operations on hardware level and those implementations are pure black magic. It is incredibly hard to write CUDA code that properly utilizes your GPU (how load data into the GPU, how to actually perform them using matrices etc. )

Can a model trained using a GPU be used for inference on a CPU?

I want to run inference on the CPU; although my machine has a GPU.
I wonder if it's possible to force TensorFlow to use the CPU rather than the GPU?
By default, TensorFlow will automatically use GPU for inference, but since my GPU is not good (OOM'ed), I wonder if there's a setting to force Tensorflow to use the CPU for inference?
For inference, I used:
tf.contrib.predictor.from_saved_model("PATH")
Assuming you're using TensorFlow 2.0, please check out this issue on GitHub:
[TF 2.0] How to globally force CPU?
The solution seems to be to hide the GPU devices from TensorFlow. You can do that using one of the methodologies described below:
TensorFlow 2.0:
my_devices = tf.config.experimental.list_physical_devices(device_type='CPU')
tf.config.experimental.set_visible_devices(devices= my_devices, device_type='CPU')
TensorFlow 2.1:
tf.config.set_visible_devices([], 'GPU')
(Credit to #ymodak and #henrysky, who answered the question on the GitHub issue.)

How can I accelerate inference speed in TensorFlow when I got sparse matrix from pruning?

I got a sparse weight matrix from Tensorflow-pruning to reduce SqueezeNet. After strip_pruning_vars, I checked the most of elements in weight matrix pruned to 0 successfully. However, the performance of the model didn't increase on what I expected. It seems that additional software library or hardware supporting sparse matrix operations are required. Someone told me that using Intel-MKL library will be helpful, but I don't know how to integrate it with Tensorflow. Now, I have .pb files of SqueezeNet pruned. Any type of help will be highly appreciated.
You can try IntelĀ® Optimization for TensorFlow* Wheel.
It is recommended to use an Intel environment for the same.
Please follow the below steps.
Create a conda environment using the command:
conda create -n my_intel_env -c intel python=3.6
Activate the environment.
source activate my_intel_env
Install the wheel
pip install https://storage.googleapis.com/intel-optimized-tensorflow/tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl
For more details, you can refer https://software.intel.com/en-us/articles/intel-optimization-for-tensorflow-installation-guide
After installation you can check whether mkl is enabled by following the below commands from the python prompt.
from tensorflow.python.framework import test_util
test_util.IsMklEnabled()
This should return 'True' if mkl is enabled.
Hope this helps.
I have met the same problem with you. I used tensorflow to prune a model, but in fact the pruned model did not got a faster prediction speed.
In roadmap of tensorflow (https://www.tensorflow.org/model_optimization/guide/roadmap) they say that they will support for sparse model execution in the future. So I guess the reason is tensorflow does not support it so far, so we can only get a sparse model but no speed improvement.