Tensorflow Performance Does Not Improve with additional GPU - tensorflow

I'm testing the standard benchmark for tensorflow with my desktop config as shown below.
Intel i7-7700k
Asus B250 Mining edition
16 Gigabyte p106
32GB memory
Ubuntu 16.04 cuda 9.0 and cudnn 7.1
Tensorflow 1.10 Installed
However, the results for 8 cards and 16 cards are the same.
Any idea why is this case happening?

This depends on your setup and the parameters you're using in the benchmark.
Verify nvidia-drivers are properly working: nvidia-smi.
All your GPUs should be listed there.
Verify tf-nightly-gpu is installed: pip list. This is a requirement according to benchmark documentation.
While the model is training, use nvidia-smi again to check if there is actual GPU utilization, and how many GPUs are utilized.
Try changing variable_update parameter values.

I install tf-nightly-gpu and variable_update=independent
enter image description here

Related

Using the RTX 3070 laptop GPU for CNN model training with a windows system

I'm trying to use my laptop RTX 3070 GPU for CNN model training because I have to employ a exhastive grid search to tune the hyper parameters. I tried many different methods however, I could not get it done. Can anyone kindly point me in the right direction?
I followed the following procedure.
The procedure:
Installed the NVIDIA CUDA Toolkit 11.2
Installed NVIDIA cuDNN 8.1 by downloading and pasting the files (bin,include,lib) into the NVIDIA GPU Computing Toolkit/CUDA/V11.2
Setup the environment variable by including the path in the system path for both bin and libnvvm.
Installed tensorflow 2.11 and python 3.8 in a new conda environment.
However, I was unable to setup the system to use the GPU that is available. The code seems to be only using the CPU and when I query the following request I get the below output.
query:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Output:
TensorFlow version: 2.11.0
Num GPUs Available: 0
Am I missing something here or anyone has the same issue like me?
You should use DirectML plugin. From tensorflow 2.11 Gpu support has been dropped for native windows. you need to use DirectML plugin.
You can follow the tutorial here to install

I have high CPU and low GPU usage when training a model. I installed CUDA and tensorflow-gpu

I have a low GPU and a high CPU usage on MNIST dataset with this model. I installed CUDA for the GPU, but nothing has changed. Can you help me?
Model
Training
tensorflow-gpu requires CUDNN, in addition to CUDA. CUDNN requires a developer account and download it from https://developer.nvidia.com/cudnn,
You can refer to the below page,
https://www.tensorflow.org/install/gpu
and install CUDNN and set required paths, depending on your OS as described.
If that does not help, you should post the output log (terminal) of launching your training script.

I'd like to manipulate the way using gpu in tensorflow lite, what can i study for that

At first, let me explain what i have to do.
My develop enviroment is Tizen OS. may be you are unfamilier that, anyway this os is using linux kernel based redhat and targeting on mobile, tv, etc.. And my target device is consists of exynos 5422 and arm mali-t628.
My main work is implement some gpu library to let tensorflow lite's operation can use the library.
I proceeded to build and install tensorflow lite as a rpm package file.
I am googling many times about the tensorflow and gpu. and get some useless information about cuda. i didnt see any info for my case(tizen and mali gpu).
i think linux have gpu instruction like the cpu or library.. but i cant find them.
can you suggest search keyword or document?
You can go to nvidia’s cuda toolkit page, where you can find the documentation and
Training buttons / options.
Also there’s the CUDA programming guide wich i myself find very usefull and helpull for CUDA.
I believe that one or two of those may help you.
CUDA is for NVidia GPU. Mali is not NVidia's, but ARM's. So you CANNOT use CUDA in your given hardware. Besides, if you want CUDA, you'd better drop Tensorflow-lite and use Tensorflow.
If you want to use CUDA, get a hardware with supported NVidia GPU (e.g., x64 machine with NVidia GPU). Note that you can use Tensorflow-GPU & CUDA/CUDNN in Tizen with x64+NVidia GPU. You just need to be careful on nvidia GPU kernel driver version and userspace driver version. Because NVidia's GPU userspace driver and CUDA/CUDNN are statically built, its Linux drivers are compatible with Tizen. (I've tested tensorflow-gpu, CUDA/CUDNN in Tizen with NVidia driver version 111... probably in winter, 2017)
If you want to use Tizen/Tensorflow-lite in the given hardware, forget CUDA.

Is it time saving for loading a saved tensorflow model

The question is,I cannot make my computer work for my tensorflow-gpu on ubuntu system. Because NVIDIA driver cannot be installed on ubuntu.So I run tensorflow-gpu on Windows10,but it doesnot support tensorflow-serving.
I know Docker can help me to do it,and i really installed it,but just tensorflow-cpu.That would be very slowly if I just run tensorflow-cpu version.
In case that,I came up with a thought that I install two tensorflow,one is GPU version and on system,the other is CPU version on Docker.GPU version for training and save a model,then CPU version loading the saved model.
What I want to know is does this way work,and is it time saving?Or put it simply,does it take less time than just run tensorflow-cpu version on Docker?
TensorFlow GPU with NVIDIA GPUs on Ubuntu is supported, and there are drivers available. Check this tutorial.

TensorFlow - which Docker image to use?

From TensorFlow Download and Setup under
Docker installation I see:
b.gcr.io/tensorflow/tensorflow latest 4ac133eed955 653.1 MB
b.gcr.io/tensorflow/tensorflow latest-devel 6a90f0a0e005 2.111 GB
b.gcr.io/tensorflow/tensorflow-full latest edc3d721078b 2.284 GB
I know 2. & 3. are with source code and I am using 2. for now.
What is the difference between 2. & 3. ?
Which one is recommended for "normal" use?
TLDR:
First of all - thanks for Docker images! They are the easiest and cleanest way to start with TF.
Few aside things about images
there is no PIL
there is no nano (but there is vi) and apt-get cannot find it. yes i probable can configure repos for it, but why not out of the box
There are four images:
b.gcr.io/tensorflow/tensorflow: TensorFlow CPU binary image.
b.gcr.io/tensorflow/tensorflow:latest-devel: CPU Binary image plus source code.
b.gcr.io/tensorflow/tensorflow:latest-gpu: TensorFlow GPU binary image.
gcr.io/tensorflow/tensorflow:latest-devel-gpu: GPU Binary image plus source code.
And the two properties of concern are:
1. CPU or GPU
2. no source or plus source
CPU or GPU: CPU
For a first time user it is highly recommended to avoid the GPU version as they can be any where from difficult to impossible to use. The reason is that not all machines have an NVidia graphic chip that meet the requirements. You should first get TensorFlow working to understand it then move onto using the GPU version if you want/need.
From TensorFlow Build Instructions
Optional: Install CUDA (GPUs on Linux)
In order to build or run TensorFlow with GPU support, both Cuda
Toolkit 7.0 and CUDNN 6.5 V2 from NVIDIA need to be installed.
TensorFlow GPU support requires having a GPU card with
NVidia Compute Capability >= 3.5. Supported cards include but are not limited to:
NVidia Titan
NVidia Titan X
NVidia K20
NVidia K40
no source or plus source: no source
The docker images will work without needing the source. You should only want or need the source if you need to rebuild TensorFlow for some reason such as adding a new OP.
The standard recommendation for someone new to using TensorFlow is to start with the CPU version without the source.