Tensorflow quantization - tensorflow

I would like to optimize a graph using Tensorflow's transform_graph tool. I tried optimizing the graph from MultiNet (and others with similar encoder-decoder architectures). However, the optimized graph is actually slower when using quantize_weights, and even much slower when using quantize_nodes. From Tensorflow's documentation, there may be no improvements, or it may even be slower, when quantizing. Any idea if this is normal with the graph/software/hardware below?
Here is my system information for your reference:
OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from: using TF source code (CPU) for graph conversion, using binary-python(GPU) for inference
TensorFlow version: both using r1.3
Python version: 2.7
Bazel version: 0.6.1
CUDA/cuDNN version: 8.0/6.0 (inference only)
GPU model and memory: GeForce GTX 1080 Ti
I can post all the scripts used to reproduce if necessary.

It seems like quantization in Tensorflow only happens on CPUs. See: https://github.com/tensorflow/tensorflow/issues/2807

I got same problem in PC enviroment. My model is 9 times slower than not quantize.
But when I porting my quantized model into android application, its ok to speed up.
Seems like current only work on CPU and only ARM base CPU such as android phone.

Related

Using the RTX 3070 laptop GPU for CNN model training with a windows system

I'm trying to use my laptop RTX 3070 GPU for CNN model training because I have to employ a exhastive grid search to tune the hyper parameters. I tried many different methods however, I could not get it done. Can anyone kindly point me in the right direction?
I followed the following procedure.
The procedure:
Installed the NVIDIA CUDA Toolkit 11.2
Installed NVIDIA cuDNN 8.1 by downloading and pasting the files (bin,include,lib) into the NVIDIA GPU Computing Toolkit/CUDA/V11.2
Setup the environment variable by including the path in the system path for both bin and libnvvm.
Installed tensorflow 2.11 and python 3.8 in a new conda environment.
However, I was unable to setup the system to use the GPU that is available. The code seems to be only using the CPU and when I query the following request I get the below output.
query:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Output:
TensorFlow version: 2.11.0
Num GPUs Available: 0
Am I missing something here or anyone has the same issue like me?
You should use DirectML plugin. From tensorflow 2.11 Gpu support has been dropped for native windows. you need to use DirectML plugin.
You can follow the tutorial here to install

How to Use Build a Keras (TF) model using GPU?

The question is pretty straightforward but nothing has really been answered.
Pretty simple, how do I know that when I build a Sequential() model in tensorflow via Keras it's going to use my GPU?
Normally, in Torch, so easy just use 'device' parameter and can verify via nvidia-smi volatility metric. I tried it while building model in TF but nvidia-smi shows 0% usage across all GPU devices.
Tensorflow uses GPU for most of the operations by default when
It detects at least one GPU
Its GPU support is installed and configured properly. For information regarding how to install and configure it properly for GPU support: https://www.tensorflow.org/install/gpu
One of the requirements to emphasize is that specific version of CUDA library has to be installed. e.g. Tensorflow 2.5 requires CUDA 11.2. Check here for the CUDA version required for each version of TF.
To know whether it detects GPU devices:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
It will also print out debug messages by default to stderr to indicate whether the GPU support is configured properly and whether it detects GPU devices.
To validate using nvidia-smi that it is really using GPU:
You have to define a sufficiently deep and complex neural network model such that the bottleneck is in the GPU side. This can be achieved by increasing the number of layers and the number of channels in each of the layers.
When doing training or inference of the model like model.fit() and model.evaluate(), the GPU utilization in the logging from nvidia-smi should be high.
To know exactly where each operation will be executed, you can add the following line in the beginning of your codes
tf.debugging.set_log_device_placement(True)
For more information: https://www.tensorflow.org/guide/gpu

Tensorflow 1.14 performance issue on rtx 3090

I am running a model written with TensorFlow 1.x on 4x RTX 3090 and it is taking a long time to start up the training than as in 1x RTX 3090. Although, as training starts, it gets finished up earlier in 4x than in 1x. I am using CUDA 11.1 and TensorFlow 1.14 in both the GPUs.
Secondly, When I am using 1x RTX 2080ti, with CUDA 10.2 and TensorFlow 1.14, it is taking less amount to start the training as compared to 1x RTX 3090 with 11.1 CUDA and Tensorflow 1.14. Tentatively, it is taking 5 min in 1x RTX 2080ti, 30-35 minutes in 1x RTX 3090, and 1.5 hrs in 4x RTX 3090 to start the training for one of the datasets.
I'll be grateful if anyone can help me to resolve this issue.
I am using Ubuntu 16.04, Coreā„¢ i9-10980XE CPU, and 32 GB ram both in 2080ti and 3090 machines.
EDIT: I found out that TF takes a long start-up time in Ampere architecture GPUs, according to this, but I'm still unclear if this is the case; and, if this is the case, does any solution exist for it?
T.F. 1.x does not have binaries for CUDA 11.1, so at the start, it takes time to compile. Because RTX 3090 compiles using PTX & JIT-compiler it takes a long time.
A general solution for this is to increase the cache size,.using code:-"export CUDA_CACHE_MAXSIZE=2147483648" (here 2147483648 is the cache size, you can set it any number by considering memory limit and it's usage in other processes in account). Refer to https://www.tensorflow.org/install/gpu for clarification. From this in the subsequent run, start-up time will be small. But even after this, binaries produce(At this start) will not be compatible with CUDA 11.1
The best is to migrate the code from T.F. 1.x to 2.x(2.4+) to make it run on RTX 30XX series or try compiling T.F. 1.x from source with CUDA 11.1(Not sure on this).
As Thunder explained, TensorFlow 1.x is not supported on Nvidia Ampere GPUs, and it looks like it never will be, as Ampere streaming multiprocessor (SM_86) are only supported on CUDA 11.1, see https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849/2 and TensorFlow 1.x wasn't fully supported on new versions of CUDA for a while now, for probably similar reason as described in the link above. Unfortunately TensorFlow version 1.x is no longer supported or maintained, see https://github.com/tensorflow/tensorflow/issues/43629#issuecomment-700709796
However, if you have to use Stylegan 2 model, you might have some luck with Nvidia Tensorflow, which apparently has support for version 1.15 on Ampere GPUs, see https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/
Here's the proposed solution on linux:
https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-docker-or-CUDA-install-2005/
On windows, I managed to get my RTX3080TI working with TF 1.15 using WSL2 with directml:
https://learn.microsoft.com/en-us/windows/ai/directml/gpu-tensorflow-wsl
Results is abt 1.5 times faster compared to my RTX2080TI.

Can I run Tensorflow, Keras,and Pytorch for deep learning projects on the latest and highly spec iMac Pro without Nvidia GPUs

I love my iMac and do not mind paying top dollars for it. However I need to run Tensorflow, Keras, and Pytorch for deep learning projects. Can I run them on the latest and maxed-out spec iMac Pro ?
tensorflow 1.8 supports ROCm, IDK how it performs next to nvidia's CUDA
but that means that if you have GPU (radeon) that supports ROCm you can use tensorflow gpu
running tensorflow on gpu is possible but extremely slow and can be added to the definition of torture

TensorFlow - which Docker image to use?

From TensorFlow Download and Setup under
Docker installation I see:
b.gcr.io/tensorflow/tensorflow latest 4ac133eed955 653.1 MB
b.gcr.io/tensorflow/tensorflow latest-devel 6a90f0a0e005 2.111 GB
b.gcr.io/tensorflow/tensorflow-full latest edc3d721078b 2.284 GB
I know 2. & 3. are with source code and I am using 2. for now.
What is the difference between 2. & 3. ?
Which one is recommended for "normal" use?
TLDR:
First of all - thanks for Docker images! They are the easiest and cleanest way to start with TF.
Few aside things about images
there is no PIL
there is no nano (but there is vi) and apt-get cannot find it. yes i probable can configure repos for it, but why not out of the box
There are four images:
b.gcr.io/tensorflow/tensorflow: TensorFlow CPU binary image.
b.gcr.io/tensorflow/tensorflow:latest-devel: CPU Binary image plus source code.
b.gcr.io/tensorflow/tensorflow:latest-gpu: TensorFlow GPU binary image.
gcr.io/tensorflow/tensorflow:latest-devel-gpu: GPU Binary image plus source code.
And the two properties of concern are:
1. CPU or GPU
2. no source or plus source
CPU or GPU: CPU
For a first time user it is highly recommended to avoid the GPU version as they can be any where from difficult to impossible to use. The reason is that not all machines have an NVidia graphic chip that meet the requirements. You should first get TensorFlow working to understand it then move onto using the GPU version if you want/need.
From TensorFlow Build Instructions
Optional: Install CUDA (GPUs on Linux)
In order to build or run TensorFlow with GPU support, both Cuda
Toolkit 7.0 and CUDNN 6.5 V2 from NVIDIA need to be installed.
TensorFlow GPU support requires having a GPU card with
NVidia Compute Capability >= 3.5. Supported cards include but are not limited to:
NVidia Titan
NVidia Titan X
NVidia K20
NVidia K40
no source or plus source: no source
The docker images will work without needing the source. You should only want or need the source if you need to rebuild TensorFlow for some reason such as adding a new OP.
The standard recommendation for someone new to using TensorFlow is to start with the CPU version without the source.