CUDA_ERROR_OUT_OF_MEMORY ubuntu 14.04 cuda8 - tensorflow

I am using tensorflow with cuda8 on ubuntu 14.04
My CPU: GeForce GT 740M
I am a newbie to GPUs
Sometimes, after I have run the same script several times on the gpu, I will get a memory error, which will be gone the next time I reboot.
Thanks for sharing your expertise with me. I dont really know how to solve this problem.
Here is the error message:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910]
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]
Found device 0 with properties:
name: GeForce GT 740M
major: 3 minor: 5 memoryClockRate (GHz) 1.0325
pciBusID 0000:01:00.0
Total memory: 1.96GiB
Free memory: 118.75MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 118.75M (124518400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)

There are many reasons you could be getting this issue.
Check if you're using the GPU to also run X server because it crashed from the start. Check with nvidia-smi to see how much space you actually have to work with.
Make sure you have the appropriate CUDA drivers and toolkit version for the tensorflow you are running (367.35 or newer and toolkit 8.0)
Is your card supported? (I think it should work but nvidia likes to be sneaky about supporting old hardware where they lock you out as a way to buy newer nvidia GPUs). After double checking your card is supported. Needs CUDA compute >= 3.0
You can debug your code with the tensorflow debugger.
Last but not least as comments have suggested it seems like your GPU resources aren't being freed after your software has ended. Make sure you kill the process as the GPU will free the resources after the program calls exit().

Related

Almost no free 1080 ti memory allocation when running a tensorflow-gpu device

I am testing a recently bought ASUS ROG STRIX 1080 ti (11 GB) card via a simple test python (matmul.py) program from https://learningtensorflow.com/lesson10/ .
The virtual environment (venv) setup is as follows : ubuntu=16.04, tensorflow-gpu==1.5.0, python=3.6.6, CUDA==9.0, Cudnn==7.2.1.
CUDA_ERROR_OUT_OF_MEMORY occured.
And, the most strangest : totalMemory: 10.91GiB freeMemory: 61.44MiB ..
I am not sure whether if it was due to the environmental setup or due to the 1080 ti itself. I would appreciate if any excerpts could advise here.
The terminal showed -
(venv) xx#xxxxxx:~/xx$ python matmul.py gpu 1500
2018-10-01 09:05:12.459203: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-01 09:05:12.514203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-01 09:05:12.514445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 61.44MiB
2018-10-01 09:05:12.514471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-01 09:05:12.651207: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 11.44M (11993088 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
......
It can happen that a Python process gets stuck on the GPU. Always check the processes with nvidia-smi and kill them manually if necessary.
I solved this problem by putting a cap on the memory usage:
def gpu_config():
config = tf.ConfigProto(
allow_soft_placement=True, log_device_placement=False)
config.gpu_options.allow_growth = True
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.8
print("GPU memory upper bound:", upper)
return config
Then you can do just do:
config = gpu_config()
with tf.Session(config=config) as sess:
....
After reboot, I was able to run sample codes of tersorflow.org - https://www.tensorflow.org/guide/using_gpu without memory issues.
Before running tensorflow samples codes for checking 1080 ti, I had a difficulty training Mask-RCNN models as posted -
Mask RCNN Resource exhausted (OOM) on my own dataset
After replacing cudnn 7.2.1 with 7.0.5, no more resource exhausted (OOM) issue occurred.

Your kernel may have been built without NUMA support

I have Jetson TX2, python 2.7, Tensorflow 1.5, CUDA 9.0
Tensorflow seems to be working but everytime, I run the program, I get this warning:
with tf.Session() as sess:
print (sess.run(y,feed_dict))
...
2018-08-07 18:07:53.200320: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node Your kernel may have been built without NUMA support.
2018-08-07 18:07:53.200427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6
minor: 2
memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB
freeMemory: 1.79GiB
2018-08-07 18:07:53.200474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-08-07 18:07:53.878574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
Should I be worried? Or is it something negligible?
It shouldn't be a problem for you, since you don't need NUMA support for this board (it has only one memory controller, so memory accesses are uniform).
Also, I found this post on nvidia forum that seems to confirm this.

I'm trying to use tensorflow on my ubuntu 16.04 machine, but even after installing tensorflow gpu; I cant use gpu for my tensorflow

I'm trying to use tensorflow GPU on my ubuntu 16.04 machine. I've successfuly installed CUDA toolkit (8.0.61) and cuDNN (6.0.21). The problem is, I can't use tensorflow gpu even after this installtion process.
While importing tensorflow is not showing any import lines for tensorflow GPU.its not showing any lines while importing tensorflow, while other my other ubuntu machine it shows some lines while importing tensorflow.
Nvidia driver version and usage
There are no lines when importing TensorFlow. Only starting a sessions gives you some output:
>>> import tensorflow as tf
>>> tf.Session()
2017-09-17 20:06:20.174697: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2017-09-17 20:06:20.343531: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-17 20:06:20.344062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Found device 0 with properties:
name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.329
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.61GiB
2017-09-17 20:06:20.344084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1055] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2)
<tensorflow.python.client.session.Session object at 0x7f387d89f210>
After consulting my crystal ball, I suggest you to further check the environment variable TF_CPP_MIN_LOG_LEVEL.

Tensor flow - Mac GPU installation

After I run a python3 script I get the following statements and do not know where the 3 errors are coming from. I am using cudnn v5.0 but obviously I have gone wrong somewhere along the installation pipeline. Any help would be fantastic.
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.dylib locally
number of elements at final reshape = %d. 61440
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.28GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:354] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:361] error retrieving driver version: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got ""
E tensorflow/stream_executor/cuda/cuda_dnn.cc:321] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Abort trap: 6
You probably installed the wrong cuDNN!
I had the same probablem and was able to fix mine by replacing my cuDNN with one compatiable:

Tensorflow takes >1 min on first run on video card with 5.0 compute capability

I'm running tensorflow 0.8.0 for python3 (pip installation), and the following file test.py:
import tensorflow as tf
a = tf.convert_to_tensor([1], dtype=tf.int32)
b = tf.to_float(a)
with tf.Session():
b.eval()
... takes more than a minute to run:
$time python3 test.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.61GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)
real 1m6.985s
user 1m6.700s
sys 0m1.480s
I should mention other tensorflow programs seem to work fine, e.g.
$time python3 -m tensorflow.models.image.mnist.convolutional
takes under 4 minutes.
Edit:
$cat /usr/local/cuda/version.txt
CUDA Version 7.5.18
$ls /usr/local/cuda/lib64/libcudnn*
/usr/local/cuda/lib64/libcudnn.so /usr/local/cuda/lib64/libcudnn.so.4.0.7
/usr/local/cuda/lib64/libcudnn.so.4 /usr/local/cuda/lib64/libcudnn_static.a
I think your GPU GTX 860M is a sm_50 device. The default TensorFlow binary supports sm_35 and sm_52 by default. That means your binary only has PTX, and the Cuda runtime has to JIT them into SASS on the first run of that kernel, and that takes one minute or so. But they should be cached in later runs, unless the caching was explicitly disabled.
The first call to eval() or run() is typically much slower than subsequent calls since it needs to setup the session. Subsequent calls to eval/run are typically much faster.