Tensorflow 1.14 performance issue on rtx 3090 - tensorflow

I am running a model written with TensorFlow 1.x on 4x RTX 3090 and it is taking a long time to start up the training than as in 1x RTX 3090. Although, as training starts, it gets finished up earlier in 4x than in 1x. I am using CUDA 11.1 and TensorFlow 1.14 in both the GPUs.
Secondly, When I am using 1x RTX 2080ti, with CUDA 10.2 and TensorFlow 1.14, it is taking less amount to start the training as compared to 1x RTX 3090 with 11.1 CUDA and Tensorflow 1.14. Tentatively, it is taking 5 min in 1x RTX 2080ti, 30-35 minutes in 1x RTX 3090, and 1.5 hrs in 4x RTX 3090 to start the training for one of the datasets.
I'll be grateful if anyone can help me to resolve this issue.
I am using Ubuntu 16.04, Coreā„¢ i9-10980XE CPU, and 32 GB ram both in 2080ti and 3090 machines.
EDIT: I found out that TF takes a long start-up time in Ampere architecture GPUs, according to this, but I'm still unclear if this is the case; and, if this is the case, does any solution exist for it?

T.F. 1.x does not have binaries for CUDA 11.1, so at the start, it takes time to compile. Because RTX 3090 compiles using PTX & JIT-compiler it takes a long time.
A general solution for this is to increase the cache size,.using code:-"export CUDA_CACHE_MAXSIZE=2147483648" (here 2147483648 is the cache size, you can set it any number by considering memory limit and it's usage in other processes in account). Refer to https://www.tensorflow.org/install/gpu for clarification. From this in the subsequent run, start-up time will be small. But even after this, binaries produce(At this start) will not be compatible with CUDA 11.1
The best is to migrate the code from T.F. 1.x to 2.x(2.4+) to make it run on RTX 30XX series or try compiling T.F. 1.x from source with CUDA 11.1(Not sure on this).

As Thunder explained, TensorFlow 1.x is not supported on Nvidia Ampere GPUs, and it looks like it never will be, as Ampere streaming multiprocessor (SM_86) are only supported on CUDA 11.1, see https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849/2 and TensorFlow 1.x wasn't fully supported on new versions of CUDA for a while now, for probably similar reason as described in the link above. Unfortunately TensorFlow version 1.x is no longer supported or maintained, see https://github.com/tensorflow/tensorflow/issues/43629#issuecomment-700709796
However, if you have to use Stylegan 2 model, you might have some luck with Nvidia Tensorflow, which apparently has support for version 1.15 on Ampere GPUs, see https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

Here's the proposed solution on linux:
https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-docker-or-CUDA-install-2005/
On windows, I managed to get my RTX3080TI working with TF 1.15 using WSL2 with directml:
https://learn.microsoft.com/en-us/windows/ai/directml/gpu-tensorflow-wsl
Results is abt 1.5 times faster compared to my RTX2080TI.

Related

is CUDA 11 with RTX 3080 support tensorflow and keras?

I attached RTX 3080 to my computer. but when training on keras 2.3.1 and tensorflow 1.15, I got some error "failed to run cuBLAS_STATUS_EXECUTION_FAILED, did not mem zero GPU location . . . check failed:start_event !=nullptr && stop_event != nullptr" I think the problem is that recently released rtx 3080 and CUDA 11 is not yet support the keras 2.xx and tensorflow 1.xx. is this right? And what make that problem?
At the moment of writing this, currently Nvidia 30xx series only fully support CUDA version 11.x, see https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849/2
Tensorflow 1.15 wasn't fully supported on CUDA since version 10.1 and newer, for probably similar reason as described in the link above. Unfortunately TensorFlow version 1.x is no longer supported or maintained, see https://github.com/tensorflow/tensorflow/issues/43629#issuecomment-700709796
TensorFlow 2.4 is your best bet with an Ampere GPU. It has now a stable release, and it has official support for CUDA 11.0, see https://www.tensorflow.org/install/source#gpu
As TensorFlow 1.x is never going to be updated or maintained by TensorFlow team, I would strongly suggest moving to TensorFlow 2.x, excluding personal preferences, it's better in almost every way and has tf.compat module for backwards compatibility with TensorFlow 1.x code, if rewriting you code base is not an option. However, even that module is no longer maintained, really showing that version 1.x is dead, see https://www.tensorflow.org/guide/versions#what_is_covered
However, if you're dead set on using TensorFlow 1.15, you might have a chance with Nvidia Tensorflow, which apparently has support for version 1.15 on Ampere GPUs, see https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

Tensorflow model slower than in the documentation

I am using the code at: https://keras.io/examples/imdb_fasttext/ for testing the performance of my PC. I have GTX 2060, Ubuntu 18.04, Tensorflow 2.0, Cuda 10.1, cuDNN7.6. I got 22 secs/epoch using the bi-grams, however according to this page, only 2 secs/epoch are needed in a GTx 980M GPU. I was hoping to have a second per epoch with my configuration.
Can anyone help me understand what could be the issue?
Many thanks,
Roxana

Tensorflow quantization

I would like to optimize a graph using Tensorflow's transform_graph tool. I tried optimizing the graph from MultiNet (and others with similar encoder-decoder architectures). However, the optimized graph is actually slower when using quantize_weights, and even much slower when using quantize_nodes. From Tensorflow's documentation, there may be no improvements, or it may even be slower, when quantizing. Any idea if this is normal with the graph/software/hardware below?
Here is my system information for your reference:
OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from: using TF source code (CPU) for graph conversion, using binary-python(GPU) for inference
TensorFlow version: both using r1.3
Python version: 2.7
Bazel version: 0.6.1
CUDA/cuDNN version: 8.0/6.0 (inference only)
GPU model and memory: GeForce GTX 1080 Ti
I can post all the scripts used to reproduce if necessary.
It seems like quantization in Tensorflow only happens on CPUs. See: https://github.com/tensorflow/tensorflow/issues/2807
I got same problem in PC enviroment. My model is 9 times slower than not quantize.
But when I porting my quantized model into android application, its ok to speed up.
Seems like current only work on CPU and only ARM base CPU such as android phone.

Cannot use GPU with Tensorflow

I've tensorflow installed with CUDA 7.5 and cuDNN 5.0. My graphics card is NVIDIA Geforce 820M with capability 2.1. However, I get this error.
Ignoring visible gpu device (device: 0, name: GeForce 820M, pci bus id: 0000:08:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
Device mapping: no known devices.
Is there any way to run GPU on a 2.1 capability?
I scoured online to find that it is cuDNN that requires this capability, so will installing an earlier version of cuDNN enable me to use GPU?
tensorflow-gpu requires GPUs of compute capability 3.0 or higher for GPU acceleration and this has been true since the very first release of tensorflow.
cuDNN has also required GPUs of compute capability 3.0 or higher since the very first release of cuDNN.
With tensorflow (using Keras), you might be able to get it to run with PlaidML PlaidML. I have been able to run tensorflow with GPU on AMD and NVidia GPUs (some are old) with PlaidML. It's not as fast as CUDA, but much faster than your CPU.
For reference, I have run it on an old Macbook Pro (2012) with an NVidia 650 GPU (1.5 GB) as well as an AMD HD Radeon 750 3GB.
The caveat is that it needs to be Keras vs lower level TF. There are lots of articles on it, and now it has support from Intel.

TensorFlow - which Docker image to use?

From TensorFlow Download and Setup under
Docker installation I see:
b.gcr.io/tensorflow/tensorflow latest 4ac133eed955 653.1 MB
b.gcr.io/tensorflow/tensorflow latest-devel 6a90f0a0e005 2.111 GB
b.gcr.io/tensorflow/tensorflow-full latest edc3d721078b 2.284 GB
I know 2. & 3. are with source code and I am using 2. for now.
What is the difference between 2. & 3. ?
Which one is recommended for "normal" use?
TLDR:
First of all - thanks for Docker images! They are the easiest and cleanest way to start with TF.
Few aside things about images
there is no PIL
there is no nano (but there is vi) and apt-get cannot find it. yes i probable can configure repos for it, but why not out of the box
There are four images:
b.gcr.io/tensorflow/tensorflow: TensorFlow CPU binary image.
b.gcr.io/tensorflow/tensorflow:latest-devel: CPU Binary image plus source code.
b.gcr.io/tensorflow/tensorflow:latest-gpu: TensorFlow GPU binary image.
gcr.io/tensorflow/tensorflow:latest-devel-gpu: GPU Binary image plus source code.
And the two properties of concern are:
1. CPU or GPU
2. no source or plus source
CPU or GPU: CPU
For a first time user it is highly recommended to avoid the GPU version as they can be any where from difficult to impossible to use. The reason is that not all machines have an NVidia graphic chip that meet the requirements. You should first get TensorFlow working to understand it then move onto using the GPU version if you want/need.
From TensorFlow Build Instructions
Optional: Install CUDA (GPUs on Linux)
In order to build or run TensorFlow with GPU support, both Cuda
Toolkit 7.0 and CUDNN 6.5 V2 from NVIDIA need to be installed.
TensorFlow GPU support requires having a GPU card with
NVidia Compute Capability >= 3.5. Supported cards include but are not limited to:
NVidia Titan
NVidia Titan X
NVidia K20
NVidia K40
no source or plus source: no source
The docker images will work without needing the source. You should only want or need the source if you need to rebuild TensorFlow for some reason such as adding a new OP.
The standard recommendation for someone new to using TensorFlow is to start with the CPU version without the source.