Tensorflow allocating all memory for any program - tensorflow

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): linux Ubuntu 16.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): v1.4.0-rc1
Python version: 3.5.5
CUDA/cuDNN version: CUDA 8.0 / cuDNN 6
GPU model and memory: nvidia gtx 1080
I am new to Tensorflow. So this could easily be some silly installation error that I don't see.
I open python to test TF installation:
import tensorflow as tf
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Resulting in:
I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-04-11 21:39:44.830140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 78.94MiB
2018-04-11 21:39:44.830178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-11 21:39:44.832231: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 78.94M (82771968 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.834394: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 71.04M (74494976 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.835825: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 63.94M (67045632 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.837560: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 57.55M (60341248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.839233: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 51.79M (54307328 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.841757: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 46.61M (48876800 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.843632: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 41.95M (43989248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.845588: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 37.76M (39590400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.847229: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 33.98M (35631360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.849278: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 30.58M (32068352 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-04-11 21:39:44.850967: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 27.52M (28861696 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 6037705122138393497
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 82771968
locality {
bus_id: 1
}
incarnation: 11403601020071115295
physical_device_desc: "device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

Supposing your question is "Why does Tensorflow allocate all available GPU memory even though much less memory would be enough for my program?", then the answer is that they do this to reduce GPU memory fragmentation. You can change this default behavior with some settings like config.gpu_options.allow_growth and config.gpu_options.per_process_gpu_memory_fraction to make Tensorflow less memory hungry at the expense of allowing some potential memory fragmentation to occur. Detailed explanation in the Tensorflow Programmer's Guide Using GPU chapter.

Related

How to use a single Gpu from multi Gpu system in tensorflow

I have a multi_gpu system. I tried to run the model on GPU 6. I tried different things to run it on the GPU id 6. I am not able to do that. How can I make it possible that GPU 6 will be available for the notebook? I also attached the code enter image description here
import os
import tensorflow as tf
os.environ["CUDA_VISIBLE_DEVICES"]="6"
print(tf.test.gpu_device_name())
from tensorflow.python.client import device_lib
devices_tf = device_lib.list_local_devices()
devices_tf = print(devices_tf)
Below is the output which I got from the above code:
/device:GPU:0
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15691930590178259318
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 219742208
locality {
bus_id: 1
links {
}
}
incarnation: 10143841604030116090
physical_device_desc: "device: 0, name: GeForce RTX 3090, pci bus id:
0000:23:00.0, compute capability: 8.6"
xla_global_id: 416903419
]

Is there a way to set all my GPUs to NOT be XLA so I can train with multiple gpus rather than just one?

I would like to train keras models using multiple GPUs. My understanding is that you cannot currently train multiple gpus using XLA. The issue is I can't figure out how to turn off XLA. Every GPU is listed as an xla gpu.
For reference, I am using 3 RTX2070s on the latest Ubuntu desktop. nvidia-smi does indeed show all 3 gpus.
I have tried uninstalling and reinstalling tensorflow-gpu. That does not help.
from
keras.utils.training_utils import multi_gpu_model
model = multi_gpu_model(model,gpus=3)
ValueError:
To call `multi_gpu_model` with `gpus=3`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2']. However this machine only has: ['/cpu:0', '/xla_cpu:0', '/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2']. Try reducing `gpus`.
EDIT: I am using tensorflow-gpu and actually I've just confirmed it isn't even using one gpu. I confirmed this by cranking up the batch size to 10,000 and saw no change to nvidia-smi but I did see changes to the cpu/memory usage via htop.
EDIT2:
tf.test.gpu_device_name()
prints just an empty string
whereas
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
prints all of my devices...
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7781250607362587360
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 12317810384332135154
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1761593194774305176
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 11323027499711415341
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:2"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 3573490477127930095
physical_device_desc: "device: XLA_GPU device"
]
I faced this problem either.
Sometimes I fixed it by reinstalling the tensorflow-gpu package.
pip uninstall tensorflow-gpu
pip install tensorflow-gpu
However, sometimes these commands didn't work. So I tried the following ones and it works surprisingly.
conda install -c anaconda tensorflow-gpu

How to configure Tensorflow to use a specific GPU?

These are the activated devices that I have:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5415837867258701517
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3198956339
locality {
bus_id: 1
links {
}
}
incarnation: 12462133041849407996
physical_device_desc: "device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0"
]
What I want to do is to configure my program to use GeForce GTX 960M and also make this configuration permanent for all my previous/future programs if is it possible?
try with the function: set_visible_devices
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.set_visible_devices(physical_devices[1:],'GPU')
Where you can specify which GPUs you would like to use

how to debug local variables in tensorflow

i'd like to print the value of tensor in tensorflow,but it falied,how can i correct it?
train_vector1, train_vector2, train_vector3, train_vector4, train_vector5,train_labels = decode_records(FLAGS.record_train, FLAGS.epoch, record_params)
sess = tf.Session(config=session_conf)
print(sess.run(train_labels))
and run the tf.py,the progress is hunted.why?
2018-06-15 16:52:53.782143: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-15 16:52:54.111552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:0d:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-06-15 16:52:54.111607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-06-15 16:52:54.408837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15128 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:0d:00.0, compute capability: 6.0)

Keras and TensorFlow: What means "Peer access not supported between device ordinals 0 and 1" and how to fix it?

I have 2 GPUs installed and when i train a model i get the following message. What means "Peer access not supported between device ordinals 0 and 1" and "Peer access not supported between device ordinals 1 and 0"? Is it an error is it something i have to fix? I mean, the model itself trains successfully in the end. I think it uses only one of the GPUs, not both. But i want to understand this message and fix the problem. Is there something i need to do?
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library curand64_80.dll locally
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "CountExtremelyRandomStats" device_type: "CPU"') for unknown op: CountExtremelyRandomStats
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "FinishedNodes" device_type: "CPU"') for unknown op: FinishedNodes
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "GrowTree" device_type: "CPU"') for unknown op: GrowTree
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ReinterpretStringToFloat" device_type: "CPU"') for unknown op: ReinterpretStringToFloat
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "SampleInputs" device_type: "CPU"') for unknown op: SampleInputs
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ScatterAddNdim" device_type: "CPU"') for unknown op: ScatterAddNdim
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNInsert" device_type: "CPU"') for unknown op: TopNInsert
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNRemove" device_type: "CPU"') for unknown op: TopNRemove
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TreePredictions" device_type: "CPU"') for unknown op: TreePredictions
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "UpdateFertileSlots" device_type: "CPU"') for unknown op: UpdateFertileSlots
Using TensorFlow backend.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:590] creating context when one is currently active; existing: 0000022BB5DD0500
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:02:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0 1
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y N
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 1: N Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 970, pci bus id: 0000:02:00.0)
This just means that the gpus cannot communicate (pass information between gpu0 to gpu1 or vice versa) without passing the data back to cpu first.