I am sometimes getting out of memory while training a model - tensorflow

Tensorflow Version : TF1.13
Using Anaconda
I studied in stack overflow to set :
TF_CUDNN_WORKSPACE_LIMIT_IN_MB = 100
To reduce the scratch space for tensorflow which is 4GB by default
GPU : NVIDIA GTX 1660 TI 6GB card
CUDA Version : 11.0
But I do not know how to set the environment variable.
I couldn't find any tutorial on it. Since I am a beginner with this can anyone give any links or tell how to set this variable?? It would be really helpful for me.

You can use the OS module in python as follows:
import os
os.environ['TF_CUDNN_WORKSPACE_LIMIT_IN_MB'] = '100'
And by the way, 100MiB is way much less for a model to be trained. At least allocate it 1GiB.

Related

GPU support for TensorFlow & PyTorch

Okay, so I've worked on a bunch of Deep Learning projects and internships now and I've never had to do heavy training. But lately I've been thinking of doing some Transfer Learning for which I'll need to run my code on a GPU. Now I have a system with Windows 10 and a dedicated NVIDIA GeForce 940M GPU. I've been doing a lot of research online, but I'm still a bit confused. I haven't installed the NVIDIA Cuda Toolkit or cuDNN or tensorflow-gpu on my system yet. I currently use tensorflow and pytorch to train my DL models. Here are my queries -
When I define a tensor in tf or pytorch, it is a cpu tensor by default. So, all the training I've been doing so far has been on the CPU. So, if I make sure to install the correct versions of Cuda and cuDNN and tensorflow-gpu (specifically for tensorflow), I can run my models on my GPU using tf-gpu and pytorch and that's it? (I'm aware of the torch.cuda.is_available() in pytorch to ensure pytorch can access my GPU and the device_lib module in tf to check if my gpu is visible to tensorflow)(I'm also aware of the fact that tf doesnt support all Nvidia GPUs)
Why does tf have a separate module for GPU support? PyTorch doesnt seem to have that and all you need to do is cast your tensor from cpu() to cuda() to switch between them.
Why install cuDNN? I know it is a high-level API CUDA built for support to train Deep Neural Nets on the GPU. But do tf-gpu and torch use these in the backend while training on the gpu?
After tf == 1.15, did they combine CPU and GPU support all into one package?
First of all unfortunately 940M is a kinda weak GPU for training. I suggest you use Google colab for faster training but of course, it would be faster than the CPU. So here my answers to your four questions.
1-) Yes if you install the requirements correctly, then you can run on GPU. You can manually place your data to your GPU as well. You can check implementations on TensorFlow. In PyTorch, you should specify the device that you want to use. As you said you should do device = torch.device("cuda" if args.cuda else "cpu") then for models and data you should always call .to(device) Then it will automatically use GPU if available.
2-) PyTorch also needs extra installation (module) for GPU support. However, with recent updates both TF and PyTorch are easy to use for GPU compatible code.
3-) Both Tensorflow and PyTorch is based on cuDNN. You can use them without cuDNN but as far as I know, it hurts the performance but I'm not sure about this topic.
4-) No they are still different packages. tensorflow-gpu==1.15 and tensorflow==1.15 what they did with tf2, was making the tensorflow more like Keras. So it is more simplified then 1.15 or before.
Rest was already answered by regarding 3) cudNN optimizes layer and such operations on hardware level and those implementations are pure black magic. It is incredibly hard to write CUDA code that properly utilizes your GPU (how load data into the GPU, how to actually perform them using matrices etc. )

how do i find out if tensorflow uses the gpu under windows and python 3.6?

how do i find out if tensorflow uses the gpu?
when I check the GPU in the task manager it says that it is 1% full. I find that a little bit, but I do not know whether the display may also be incorrect for the information.
I find the calculation too fast for only CPU, but actually too slow for GPU ...
is installed
tensorflow and tensorflow-gpu with version 1.15
This code will confirm that tensorflow using GPU or CPU
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

How to lower RAM consumption in Tensorflow?

Hello there,
I am trying to use DarkFlow, a Python implementation of YOLO (which uses Tensorflow as backend), on my Nvidia Jetson Nano to detect objects. I got all the setup and stuff, but it doesn't want to train. I set it to GPU mode and a line in the output says this:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 897MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
This is the last line it outputs before the training gets "Killed" without any further messages. Because it's a heavy convolutional NN, I think the reason is RAM over-comsumption. Now I only can use this GPU in my Jetson Nano so, does anybody have a suggestion how to lower it or how to solve the problem otherwise?
Thanks for the answers in advance!
You may try to decrease batch_size to 1 and lower the width,height values but would not recommend a training session on jetson nano. Its limited capabilities(4 GB shared RAM) hinders the learning process. To counter the limitations you could try to follow this post or this one to increase swap_area which acts as RAM but still I would recommend using nano only for inference.
EDIT1: Also it is known that Tensorflow has a tendency to try to allocate all available RAM which makes the process killed by OS. To solve the issue you could use tf.GPUOptions to limit Tensorflow's RAM usage.
Example:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.4)
session = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
We have chosen per_process_gpu_memory_fraction as 0.4 because it is best practice not to let Tensorflow allocate more RAM than half of the available resources.(Also because it is being shared)
Best of luck.

How to get the exact GPU memory usage for Keras

I recently started learning Keras and TensorFlow. I am testing out a few models currently on the MNIST dataset (pretty basic stuff). I wanted to know, exactly how much my model is consuming memory-wise, during training and inference. I tried googling but did not find much info.
I came across Nvidia-smi. I tried using config.gpu_options.allow_growth = True option but still am not able to use the exact memory python.exe is consuming due to some issues with Nvidia-smi. I know that I could run a separate pass of train and inference, but this is too cumbersome. It is very easy if I could just find the right API to do the job.
Tensorflow being such a well known and well-used library, I am hoping to find a better and faster way to get to these numbers.
Finally, once again my question is:
How to get the exact memory usage for a Keras model during training and inference.
Relevant specs:
OS: Windows 10
GPU: GTX 1050
TensorFlow version: 1.14
Please let me know if any other details are required.
Thanks!

Low NVIDIA GPU Usage with Keras and Tensorflow

I'm running a CNN with keras-gpu and tensorflow-gpu with a NVIDIA GeForce RTX 2080 Ti on Windows 10. My computer has a Intel Xeon e5-2683 v4 CPU (2.1 GHz). I'm running my code through Jupyter (most recent Anaconda distribution). The output in the command terminal shows that the GPU is being utilized, however the script I'm running takes longer than I expect to train/test on the data and when I open the task manager it looks like the GPU utilization is very low. Here's an image:
Note that the CPU isn't being utilized and nothing else on the task manager suggests anything is being fully utilized. I don't have an ethernet connection and am connected to Wifi (don't think this effects anything but I'm not sure with Jupyter since it runs through the web broswers). I'm training on a lot of data (~128GB) which is all loaded into the RAM (512GB). The model I'm running is a fully convolutional neural network (basically a U-Net architecture) with 566,290 trainable parameters. Things I tried so far:
1. Increasing batch size from 20 to 10,000 (increases GPU usage from ~3-4% to ~6-7%, greatly decreases training time as expected).
2. Setting use_multiprocessing to True and increasing number of workers in model.fit (no effect).
I followed the installation steps on this website: https://www.pugetsystems.com/labs/hpc/The-Best-Way-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-1187/#look-at-the-job-run-with-tensorboard
Note that this installation specifically DOESN'T install CuDNN or CUDA. I've had trouble in the past with getting tensorflow-gpu running with CUDA (although I haven't tried in over 2 years so maybe it's easier with the latest versions) which is why I used this installation method.
Is this most likely the reason why the GPU isn't being fully utilized (no CuDNN/CUDA)? Does it have something to do with the dedicated GPU memory usage being a bottleneck? Or maybe something to do with the network architecture I'm using (number of parameters, etc.)?
Please let me know if you need any more information about my system or the code/data I'm running on to help diagnose. Thanks in advance!
EDIT: I noticed something interesting in the task manager. An epoch with batch size of 10,000 takes around 200s. For the last ~5s of each epoch, the GPU usage increases to ~15-17% (up from ~6-7% for the first 195s of each epoch). Not sure if this helps or indicates there's a bottleneck somewhere besides the GPU.
You for sure need to install CUDA/Cudnn to fully utilize GPU with tensorflow. You can double check that the packages are installed correctly and if the GPU is available to tensorflow/keras by using
import tensorflow as tf
tf.config.list_physical_devices("GPU")
and the output should look something like [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
if the device is available.
If you've installed CUDA/Cudnn correctly then all you need to do is change copy --> cuda in the dropdown menu in the task manager which will show the number of active cuda cores. The other indicators for the GPU will not be active when running tf/keras because there is no video encoding/decoding etc to be done; it is simply using the cuda cores on the GPU so the only way to track GPU usage is to look at the cuda utilization (when considering monitoring from the task manager)
I would first start by running one of the short "tests" to ensure Tensorflow is utilizing the GPU. For example, I prefer #Salvador Dali's answer in that linked question
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
If Tensorflow is indeed using your GPU you should see the result of the matrix multplication printed. Otherwise a fairly long stack trace stating that "gpu:0" cannot be found.
If this all works well that I would recommend utilizing Nvidia's smi.exe utility. It is available on both Windows and Linux and AFAIK installs with the Nvidia driver. On a windows system it is located at
C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe
Open a windows command prompt and navigate to that directory. Then run
nvidia-smi.exe -l 3
This will show you a screen like so, that updates every three seconds.
Here we can see various information about the state of the GPUs and what they are doing. Of specific interest in this case is the "Pwr: Usage/Cap" and "Volatile GPU-Util" columns. If your model is indeed using the/a GPU these columns should increase "instantaneously" once you start training the model.
You most likely will see an increase in fan speed and temperature unless you have a very nice cooling solution. In the bottom of the printout you should also see a Process with a name akin to "python" or "Jupityr" running.
If this fails to provide an answers as to the slow training times than I would surmise the issue lies with the model and code itself. And I think its is actually the case here. Specifically viewing the Windows Task Managers listing for "Dedicated GPU Memory Usage" pinged at basically maximum.
If you have tried #KDecker's and #OverLordGoldDragon's solution, low GPU usage is still there, I would suggest first investigating your data pipeline. The following two figures are from tensorflow official guides data performance, they are well illustrated how data pipeline will affect the GPU efficiency.
As you can see, prepare data in parallel with the training will increase the GPU usage. In this situation, CPU processing is becoming the bottleneck. You need to find a mechanism to hide the latency of preprocessing, such as changing the number of processes, size of butter etc. The efficiency of CPU should match the efficiency of the GPU. In this way, the GPU will be maximally utilized.
Take a look at Tensorpack, and it has detailed tutorials of how to speed up your input data pipeline.
Everything works as expected; your dedicated memory usage is nearly maxed, and neither TensorFlow nor CUDA can use shared memory -- see this answer.
If your GPU runs OOM, the only remedy is to get a GPU with more dedicated memory, or decrease model size, or use below script to prevent TensorFlow from assigning redundant resources to the GPU (which it does tend to do):
## LIMIT GPU USAGE
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # don't pre-allocate memory; allocate as-needed
config.gpu_options.per_process_gpu_memory_fraction = 0.95 # limit memory to be allocated
K.tensorflow_backend.set_session(tf.Session(config=config)) # create sess w/ above settings
The unusual increased usage you observe may be shared memory resources being temporarily accessed due to exhausting other available resources, especially with use_multiprocessing=True - but unsure, could be other causes
There seems to have been a change to the installation method you referenced : https://www.pugetsystems.com/labs/hpc/The-Best-Way-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-1187
It is now much easier and should eliminate the problems you are experiencing.
Important Edit You don't seem to be looking at the actual compute of the GPU, look at the attached image:
read following two pages ,u will get idea to properly setup with GPU
https://medium.com/#kegui/how-do-i-know-i-am-running-keras-model-on-gpu-a9cdcc24f986
https://datascience.stackexchange.com/questions/41956/how-to-make-my-neural-netwok-run-on-gpu-instead-of-cpu