tf.session() takes very long time - tensorflow

I am using AWS EC2 with 16 GPU Tesla.
When I finished installing tensorflow and was tyring to see if it works it stopped.
import tensorflow as tf
this works.
hello = tf.constant('Hello, TensorFlow!')
this works
sess = tf.Session()
This gives an error
2018-01-30 06:48:00.543428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-30 06:48:00.545858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:0f.0
totalMemory: 11.17GiB freeMemory: 11.03GiB
Segmentation fault (core dumped)

These two logs are just pieces of information. If you want to, you can disable them as provided in this great answer: https://github.com/tensorflow/tensorflow/issues/7778#issuecomment-281678077

Related

No module named tensorflow after installation?

I installed tensorflow-gpu but I got error in Pycharm:
ModuleNotFoundError: No module named 'tensorflow'
I checked in terminal:
$ pip3 list|grep tensorflow
tensorflow-gpu 1.4.0
tensorflow-tensorboard 0.4.0
Edit: ( after installation using venv):
Successfully installed tensorflow-gpu-1.12.0
(venv) wojtek#wojtek-GF63-8RC:~$ python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
2018-12-17 21:49:14.893016: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-17 21:49:14.961123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-17 21:49:14.961466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.58GiB
2018-12-17 21:49:14.961479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-17 21:49:15.148507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-17 21:49:15.148538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-17 21:49:15.148544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-17 21:49:15.148687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3306 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
tf.Tensor(918.94904, shape=(), dtype=float32)
You'll want to configure the interpreter src
1) In the Project Interpreters page, select one of the configured interpreters or virtual environments.
2) Click Edit.
3) In the Edit Python Interpreter dialog box that opens, type the desired interpreter name.
Changing interpreter's name
The Python interpreter name specified in the Name field, becomes visible in the list of available interpreters.
If necessary, change the path to the Python executable.

Almost no free 1080 ti memory allocation when running a tensorflow-gpu device

I am testing a recently bought ASUS ROG STRIX 1080 ti (11 GB) card via a simple test python (matmul.py) program from https://learningtensorflow.com/lesson10/ .
The virtual environment (venv) setup is as follows : ubuntu=16.04, tensorflow-gpu==1.5.0, python=3.6.6, CUDA==9.0, Cudnn==7.2.1.
CUDA_ERROR_OUT_OF_MEMORY occured.
And, the most strangest : totalMemory: 10.91GiB freeMemory: 61.44MiB ..
I am not sure whether if it was due to the environmental setup or due to the 1080 ti itself. I would appreciate if any excerpts could advise here.
The terminal showed -
(venv) xx#xxxxxx:~/xx$ python matmul.py gpu 1500
2018-10-01 09:05:12.459203: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-01 09:05:12.514203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-01 09:05:12.514445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 61.44MiB
2018-10-01 09:05:12.514471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-01 09:05:12.651207: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 11.44M (11993088 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
......
It can happen that a Python process gets stuck on the GPU. Always check the processes with nvidia-smi and kill them manually if necessary.
I solved this problem by putting a cap on the memory usage:
def gpu_config():
config = tf.ConfigProto(
allow_soft_placement=True, log_device_placement=False)
config.gpu_options.allow_growth = True
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.8
print("GPU memory upper bound:", upper)
return config
Then you can do just do:
config = gpu_config()
with tf.Session(config=config) as sess:
....
After reboot, I was able to run sample codes of tersorflow.org - https://www.tensorflow.org/guide/using_gpu without memory issues.
Before running tensorflow samples codes for checking 1080 ti, I had a difficulty training Mask-RCNN models as posted -
Mask RCNN Resource exhausted (OOM) on my own dataset
After replacing cudnn 7.2.1 with 7.0.5, no more resource exhausted (OOM) issue occurred.

Your kernel may have been built without NUMA support

I have Jetson TX2, python 2.7, Tensorflow 1.5, CUDA 9.0
Tensorflow seems to be working but everytime, I run the program, I get this warning:
with tf.Session() as sess:
print (sess.run(y,feed_dict))
...
2018-08-07 18:07:53.200320: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node Your kernel may have been built without NUMA support.
2018-08-07 18:07:53.200427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6
minor: 2
memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB
freeMemory: 1.79GiB
2018-08-07 18:07:53.200474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-08-07 18:07:53.878574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
Should I be worried? Or is it something negligible?
It shouldn't be a problem for you, since you don't need NUMA support for this board (it has only one memory controller, so memory accesses are uniform).
Also, I found this post on nvidia forum that seems to confirm this.

TF 1.7: Not found: TF GPU device with id 0 was not registered

I built TF 1.7 libraries from source.
When I use it with rust binding I get
2018-04-16 23:40:09.254248: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-16 23:40:09.254550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7465
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 4.95GiB
2018-04-16 23:40:09.254562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-16 23:40:09.383859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-16 23:40:09.383889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-16 23:40:09.383894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-16 23:40:09.384066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4711 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-16 23:40:09.463369: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
All look ok except the last line.
What is this the error? It doesn't appear in python.
Someone built TF 1.7 in Mac+python and got the same error: https://gist.github.com/pavelmalik/d51036d508c8753c86aed1f3ff1e6967

I'm trying to use tensorflow on my ubuntu 16.04 machine, but even after installing tensorflow gpu; I cant use gpu for my tensorflow

I'm trying to use tensorflow GPU on my ubuntu 16.04 machine. I've successfuly installed CUDA toolkit (8.0.61) and cuDNN (6.0.21). The problem is, I can't use tensorflow gpu even after this installtion process.
While importing tensorflow is not showing any import lines for tensorflow GPU.its not showing any lines while importing tensorflow, while other my other ubuntu machine it shows some lines while importing tensorflow.
Nvidia driver version and usage
There are no lines when importing TensorFlow. Only starting a sessions gives you some output:
>>> import tensorflow as tf
>>> tf.Session()
2017-09-17 20:06:20.174697: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2017-09-17 20:06:20.343531: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-17 20:06:20.344062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Found device 0 with properties:
name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.329
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.61GiB
2017-09-17 20:06:20.344084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1055] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2)
<tensorflow.python.client.session.Session object at 0x7f387d89f210>
After consulting my crystal ball, I suggest you to further check the environment variable TF_CPP_MIN_LOG_LEVEL.