I have Jetson TX2, python 2.7, Tensorflow 1.5, CUDA 9.0
Tensorflow seems to be working but everytime, I run the program, I get this warning:
with tf.Session() as sess:
print (sess.run(y,feed_dict))
...
2018-08-07 18:07:53.200320: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node Your kernel may have been built without NUMA support.
2018-08-07 18:07:53.200427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6
minor: 2
memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB
freeMemory: 1.79GiB
2018-08-07 18:07:53.200474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-08-07 18:07:53.878574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
Should I be worried? Or is it something negligible?
It shouldn't be a problem for you, since you don't need NUMA support for this board (it has only one memory controller, so memory accesses are uniform).
Also, I found this post on nvidia forum that seems to confirm this.
Related
I installed tensorflow-gpu but I got error in Pycharm:
ModuleNotFoundError: No module named 'tensorflow'
I checked in terminal:
$ pip3 list|grep tensorflow
tensorflow-gpu 1.4.0
tensorflow-tensorboard 0.4.0
Edit: ( after installation using venv):
Successfully installed tensorflow-gpu-1.12.0
(venv) wojtek#wojtek-GF63-8RC:~$ python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
2018-12-17 21:49:14.893016: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-17 21:49:14.961123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-17 21:49:14.961466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.58GiB
2018-12-17 21:49:14.961479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-17 21:49:15.148507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-17 21:49:15.148538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-17 21:49:15.148544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-17 21:49:15.148687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3306 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
tf.Tensor(918.94904, shape=(), dtype=float32)
You'll want to configure the interpreter src
1) In the Project Interpreters page, select one of the configured interpreters or virtual environments.
2) Click Edit.
3) In the Edit Python Interpreter dialog box that opens, type the desired interpreter name.
Changing interpreter's name
The Python interpreter name specified in the Name field, becomes visible in the list of available interpreters.
If necessary, change the path to the Python executable.
I am testing a recently bought ASUS ROG STRIX 1080 ti (11 GB) card via a simple test python (matmul.py) program from https://learningtensorflow.com/lesson10/ .
The virtual environment (venv) setup is as follows : ubuntu=16.04, tensorflow-gpu==1.5.0, python=3.6.6, CUDA==9.0, Cudnn==7.2.1.
CUDA_ERROR_OUT_OF_MEMORY occured.
And, the most strangest : totalMemory: 10.91GiB freeMemory: 61.44MiB ..
I am not sure whether if it was due to the environmental setup or due to the 1080 ti itself. I would appreciate if any excerpts could advise here.
The terminal showed -
(venv) xx#xxxxxx:~/xx$ python matmul.py gpu 1500
2018-10-01 09:05:12.459203: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-01 09:05:12.514203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-01 09:05:12.514445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 61.44MiB
2018-10-01 09:05:12.514471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-01 09:05:12.651207: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 11.44M (11993088 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
......
It can happen that a Python process gets stuck on the GPU. Always check the processes with nvidia-smi and kill them manually if necessary.
I solved this problem by putting a cap on the memory usage:
def gpu_config():
config = tf.ConfigProto(
allow_soft_placement=True, log_device_placement=False)
config.gpu_options.allow_growth = True
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.8
print("GPU memory upper bound:", upper)
return config
Then you can do just do:
config = gpu_config()
with tf.Session(config=config) as sess:
....
After reboot, I was able to run sample codes of tersorflow.org - https://www.tensorflow.org/guide/using_gpu without memory issues.
Before running tensorflow samples codes for checking 1080 ti, I had a difficulty training Mask-RCNN models as posted -
Mask RCNN Resource exhausted (OOM) on my own dataset
After replacing cudnn 7.2.1 with 7.0.5, no more resource exhausted (OOM) issue occurred.
I am using AWS EC2 with 16 GPU Tesla.
When I finished installing tensorflow and was tyring to see if it works it stopped.
import tensorflow as tf
this works.
hello = tf.constant('Hello, TensorFlow!')
this works
sess = tf.Session()
This gives an error
2018-01-30 06:48:00.543428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-30 06:48:00.545858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:0f.0
totalMemory: 11.17GiB freeMemory: 11.03GiB
Segmentation fault (core dumped)
These two logs are just pieces of information. If you want to, you can disable them as provided in this great answer: https://github.com/tensorflow/tensorflow/issues/7778#issuecomment-281678077
I'm trying to use tensorflow GPU on my ubuntu 16.04 machine. I've successfuly installed CUDA toolkit (8.0.61) and cuDNN (6.0.21). The problem is, I can't use tensorflow gpu even after this installtion process.
While importing tensorflow is not showing any import lines for tensorflow GPU.its not showing any lines while importing tensorflow, while other my other ubuntu machine it shows some lines while importing tensorflow.
Nvidia driver version and usage
There are no lines when importing TensorFlow. Only starting a sessions gives you some output:
>>> import tensorflow as tf
>>> tf.Session()
2017-09-17 20:06:20.174697: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2017-09-17 20:06:20.343531: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-17 20:06:20.344062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Found device 0 with properties:
name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.329
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.61GiB
2017-09-17 20:06:20.344084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1055] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2)
<tensorflow.python.client.session.Session object at 0x7f387d89f210>
After consulting my crystal ball, I suggest you to further check the environment variable TF_CPP_MIN_LOG_LEVEL.
I am using tensorflow with cuda8 on ubuntu 14.04
My CPU: GeForce GT 740M
I am a newbie to GPUs
Sometimes, after I have run the same script several times on the gpu, I will get a memory error, which will be gone the next time I reboot.
Thanks for sharing your expertise with me. I dont really know how to solve this problem.
Here is the error message:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910]
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]
Found device 0 with properties:
name: GeForce GT 740M
major: 3 minor: 5 memoryClockRate (GHz) 1.0325
pciBusID 0000:01:00.0
Total memory: 1.96GiB
Free memory: 118.75MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 118.75M (124518400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)
There are many reasons you could be getting this issue.
Check if you're using the GPU to also run X server because it crashed from the start. Check with nvidia-smi to see how much space you actually have to work with.
Make sure you have the appropriate CUDA drivers and toolkit version for the tensorflow you are running (367.35 or newer and toolkit 8.0)
Is your card supported? (I think it should work but nvidia likes to be sneaky about supporting old hardware where they lock you out as a way to buy newer nvidia GPUs). After double checking your card is supported. Needs CUDA compute >= 3.0
You can debug your code with the tensorflow debugger.
Last but not least as comments have suggested it seems like your GPU resources aren't being freed after your software has ended. Make sure you kill the process as the GPU will free the resources after the program calls exit().