I'm running tensorflow-gpu on Windows 10 using a simple MINST neural network program. When it tries to run, it encounters a CUBLAS_STATUS_ALLOC_FAILED error. A google search doesn't turn up anything.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
return fn(*args)
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
status, run_metadata)
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
[[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
For TensorFlow 2.2 none of the other answers worked when the CUBLAS_STATUS_ALLOC_FAILED problem was encountered. Found a solution on https://www.tensorflow.org/guide/gpu:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
I ran this code before any further calculations are made and found that the same code that produced CUBLAS error before now worked in same session. The sample code above is a specific example that sets the memory growth across a number of physical GPUs but it also solves the memory expansion problem.
The location of the "allow_growth" property of the session config seems to be different now. It's explained here: https://www.tensorflow.org/tutorials/using_gpu
So currently you'd have to set it like this:
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
tensorflow>=2.0
import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options =
tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)
I found this solution works
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto(
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)
On windows, currently tensorflow does not allocate all available memory like it says in the documentation, instead you can work around this error by allowing dynamic memory growth as follows:
tf.Session(config=tf.ConfigProto(allow_growth=True))
None of these fixes worked for me, as it seems that the structure of the tensorflow libraries have changed. For Tensorflow 2.0, the only fix that worked for me was as under Limiting GPU memory growth on this page https://www.tensorflow.org/guide/gpu
For completeness and future-proofing, here's the solution from the docs - I imagine changing memory_limit may be necessary for some people - 1 GB was fine for my case.
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Tensorflow 2.0 alpha
Allowing GPU memory growth may fix this issue. For Tensorflow 2.0 alpha / nightly there are two methods you can try, to archive this.
1.)
import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()
2.)
import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you
# want to give to tensorflow.
I suggest you try both, and see if it helps.
Source: https://www.tensorflow.org/alpha/guide/using_gpu
for keras:
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)
In my case, a stale python process was consuming memory. I killed it through task manager, and things are back to normal.
A bit late to the party but this resolves my issue with tensorflow 2.4.0 and a gtx 980ti.
Before limiting the memory I got an error like:
CUBLAS_STATUS_ALLOC_FAILED
My solution was this piece of code:
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])
I found the solution here: https://www.tensorflow.org/guide/gpu
Related
I am trying to run a keras code on a GPU node within a cluster. The GPU node has 4 GPUs per node. I made sure to have all 4 GPUs within the GPU node available for my use. I run the code below to let tensorflow use the GPU:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
print(e)
The 4 GPUs available get listed in the output. However, I got the following error when running the code:
Traceback (most recent call last):
File "/BayesOptimization.py", line 20, in <module>
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/framework/config.py", line 439, in list_logical_devices
return context.context().list_logical_devices(device_type=device_type)
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1368, in list_logical_devices
self.ensure_initialized()
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 511, in ensure_initialized
config_str = self.config.SerializeToString()
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1015, in config
gpu_options = self._compute_gpu_options()
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1074, in _compute_gpu_options
raise ValueError("Memory growth cannot differ between GPU devices")
ValueError: Memory growth cannot differ between GPU devices
Shouldn't the code list all the available gpus and set memory growth to true for each one?
I am currently using tensorflow libraries and python 3.97:
tensorflow 2.4.1 gpu_py39h8236f22_0
tensorflow-base 2.4.1 gpu_py39h29c2da4_0
tensorflow-estimator 2.4.1 pyheb71bc4_0
tensorflow-gpu 2.4.1 h30adc30_0
Any idea what the problem is and how to solve it? Thanks in advance!
Try only: os.environ["CUDA_VISIBLE_DEVICES"]="0"
instead of tf.config.experimental.set_memory_growth.
This works for me.
In case of multi-GPU devices memory growth should be constant through out all available GPUs. Either set it true for all GPUs or keep it false.
gpus = tf.config.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
Tensorflow GPU documentation
I am currently trying to run a text-based sequence to sequence model using tensorflow 2.6 and CuDNN.
The code is running, but taking suspiciously long. When I check my Task Manager, I see the following:
This looks weird to me, because all memory is taking but it's not under heavy load.
Is this expected behaviour?
System:
Windows 10
Python 3.9.9
Tensorflow & Keras 2.6
CUDA 11.6
CuDNN 8.3
NVidia RTX 3080ti
In the code I found the following settings for the GPU
def get_gpu_config():
gconfig = tf.compat.v1.ConfigProto()
gconfig.gpu_options.per_process_gpu_memory_fraction = 0.975 # Don't take 100% of the memory
gconfig.allow_soft_placement = True # Does not aggressively take all the GPU memory
gconfig.gpu_options.allow_growth = True # Take more memory when necessary
return gconfig
My python output tells me it found my graphics card:
And it is also visible in my nvidia-smi output:
Am I maybe missing a configuration?
The times it takes are similar to what I got on a CPU system, which seems off to me.
Sidenote:
The Code I try to run had to be migrated from tensorflow-gpu 1.12, but that went "relatively" smooth.
Yes this behaviour is normal for TensorFlow!
From the TensorFlow docs
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation. To limit TensorFlow to a specific set of GPUs, use the tf.config.set_visible_devices method.
If you don't want TensorFlow to allocate the totality of your VRAM, you can either set a hard limit on how much memory to use or tell TensorFlow to only allocate as much memory as needed.
To set a hard limit
Configure a virtual GPU device as follows:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.set_logical_device_configuration(
gpus[0],
[tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Only use as much as needed
You can set the environment variable TF_FORCE_GPU_ALLOW_GROWTH=true
OR
Use tf.config.experimental.set_memory_growth as follows:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
All code and information here is taken from https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth
I have read answers like:
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
set_session(tf.Session(config=config))
But it just doesn't work. There seems to be so much update in both keras and TF that almost anything written in 2017 doesn't work!
So, how to limit memory usage?
One way to restrict reserving all GPU RAM in tensorflow is to grow the amount of reservation. This method will allow you to train multiple NN using same GPU but you cannot set a threshold on the amount of memory you want to reserve.
Using the following snippet before importing keras or just use tf.keras instead.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
Seems that I have the same problem. Have you tried this?
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)
This method will make the application allocate only as much GPU memory based on runtime allocation.
2022 update of #Yustina Ivanova's answer:
Most people will encounter errors such as (one of the following):
AttributeError: module 'tensorflow.keras.backend' has no attribute 'tensorflow_backend'
AttributeError: module 'tensorflow.keras.backend' has no attribute 'set_session'
AttributeError: module 'tensorflow' has no attribute 'ConfigProto'
AttributeError: module 'tensorflow' has no attribute 'Session'
The reason is compatibility with tf2. Assuming a keras backend, this would be the fix:
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(sess)
Running AWS SageMaker with a custom model, the TrainingJob fails with an Algorithm Error when using Keras plus a Tensorflow backend in multi-gpu configuration:
from keras.utils import multi_gpu_model
parallel_model = multi_gpu_model(model, gpus=K)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
parallel_model.fit(x, y, epochs=20, batch_size=256)
This simple parallel model loading will fail. There is no further error or exception from CloudWatch logging. This configuration works properly on local machine with 2x NVIDIA GTX 1080, same Keras Tensorflow backend.
According to SageMaker documentation and tutorials the multi_gpu_model utility will work ok when Keras backend is MXNet, but I did not find any mention when the backend is Tensorflow with the same multi gpu configuration.
[UPDATE]
I have updated the code with the suggested answer below, and I'm adding some logging before the TrainingJob hangs
This logging repeats twice
2018-11-27 10:02:49.878414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-11-27 10:02:49.878462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 10:02:49.878471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3
2018-11-27 10:02:49.878477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y Y Y
2018-11-27 10:02:49.878481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N Y Y
2018-11-27 10:02:49.878486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: Y Y N Y
2018-11-27 10:02:49.878492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: Y Y Y N
2018-11-27 10:02:49.879340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 14874 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0)
2018-11-27 10:02:49.879486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 14874 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0)
2018-11-27 10:02:49.879694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:2 with 14874 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0)
2018-11-27 10:02:49.879872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:3 with 14874 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
Before there is some logging info about each GPU, that repeats 4 times
2018-11-27 10:02:46.447639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
totalMemory: 15.78GiB freeMemory: 15.37GiB
According to the logging all the 4 GPUs are visible and loaded in the Tensorflow Keras backend. After that no application logging follows, the TrainingJob status is inProgress for a while, after that it becomes Failed with the same Algorithm Error.
Looking at CloudWatch logging I can see some metrics at work. Specifically GPU Memory Utilization, CPU Utilization are ok, while GPU utilization is 0%.
[UPDATE]
Due to a known bug on Keras that is about saving a multi gpu model, I'm using this override of the multi_gpu_model utility in keras.utils
from keras.layers import Lambda, concatenate
from keras import Model
import tensorflow as tf
def multi_gpu_model(model, gpus):
#source: https://github.com/keras-team/keras/issues/8123#issuecomment-354857044
if isinstance(gpus, (list, tuple)):
num_gpus = len(gpus)
target_gpu_ids = gpus
else:
num_gpus = gpus
target_gpu_ids = range(num_gpus)
def get_slice(data, i, parts):
shape = tf.shape(data)
batch_size = shape[:1]
input_shape = shape[1:]
step = batch_size // parts
if i == num_gpus - 1:
size = batch_size - step * i
else:
size = step
size = tf.concat([size, input_shape], axis=0)
stride = tf.concat([step, input_shape * 0], axis=0)
start = stride * i
return tf.slice(data, start, size)
all_outputs = []
for i in range(len(model.outputs)):
all_outputs.append([])
# Place a copy of the model on each GPU,
# each getting a slice of the inputs.
for i, gpu_id in enumerate(target_gpu_ids):
with tf.device('/gpu:%d' % gpu_id):
with tf.name_scope('replica_%d' % gpu_id):
inputs = []
# Retrieve a slice of the input.
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_i = Lambda(get_slice,
output_shape=input_shape,
arguments={'i': i,
'parts': num_gpus})(x)
inputs.append(slice_i)
# Apply model on slice
# (creating a model replica on the target device).
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
# Save the outputs for merging back together later.
for o in range(len(outputs)):
all_outputs[o].append(outputs[o])
# Merge outputs on CPU.
with tf.device('/cpu:0'):
merged = []
for name, outputs in zip(model.output_names, all_outputs):
merged.append(concatenate(outputs,
axis=0, name=name))
return Model(model.inputs, merged)
This works ok on local 2x NVIDIA GTX 1080 / Intel Xeon / Ubuntu 16.04. It will fails on SageMaker Training Job.
I have posted this issue on AWS Sagemaker forum in
TrainingJob custom algorithm with Keras backend and multi GPU
SageMaker Fails when using Multi-GPU with
keras.utils.multi_gpu_model
[UPDATE]
I have slightly modified the tf.session code adding some initializers
with tf.Session() as session:
K.set_session(session)
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
and now at least I can see that one GPU (I assume device gpu:0) is used from the instance metrics. The multi-gpu does not work anyways.
This might not be the best answer for your problem, but this is what I am using for a multi-gpu model with Tensorflow backend. First i initialize using:
def setup_multi_gpus():
"""
Setup multi GPU usage
Example usage:
model = Sequential()
...
multi_model = multi_gpu_model(model, gpus=num_gpu)
multi_model.fit()
About memory usage:
https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory
"""
import tensorflow as tf
from keras.utils.training_utils import multi_gpu_model
from tensorflow.python.client import device_lib
# IMPORTANT: Tells tf to not occupy a specific amount of memory
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU
sess = tf.Session(config=config)
set_session(sess) # set this TensorFlow session as the default session for Keras.
# getting the number of GPUs
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
num_gpu = len(get_available_gpus())
print('Amount of GPUs available: %s' % num_gpu)
return num_gpu
Then i call
# Setup multi GPU usage
num_gpu = setup_multi_gpus()
and create a model.
...
After which you're able to make it a multi GPU model.
multi_model = multi_gpu_model(model, gpus=num_gpu)
multi_model.compile...
multi_model.fit...
The only thing here that is different from what you are doing is the way Tensorflow is initializing the GPU's. I can't imagine it being the problem, but it might be worth trying out.
Good luck!
Edit: I noticed sequence to sequence not being able to work with multi GPU. Is that the type of model you are trying to train?
I apologize for the slow response.
It seems there are a lot of threads that are running in parallel, and I want to link them together, so that other individuals who have the same issue can see the progress and discussion going on.
https://forums.aws.amazon.com/thread.jspa?messageID=881541
https://forums.aws.amazon.com/thread.jspa?messageID=881540
https://github.com/aws/sagemaker-python-sdk/issues/512
There a few questions in regards to this.
What version of TensorFlow and Keras?
I am not too sure what is causing this problem. Does your container have all of the needed dependencies such as CUDA and etc? https://www.tensorflow.org/install/gpu
Were you able to train using single GPU with Keras?
I want to specify the gpu to run my process. And I set it as follows:
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant(3.0)
with tf.Session() as sess:
while True:
print sess.run(a)
However it still allocate memory in both my two gpus.
| 0 7479 C python 5437MiB
| 1 7479 C python 5437MiB
There are 3 ways to achieve this:
Using CUDA_VISIBLE_DEVICES environment variable.
by setting environment variable CUDA_VISIBLE_DEVICES="1" makes only device 1 visible and by setting CUDA_VISIBLE_DEVICES="0,1" makes devices 0 and 1 visible. You can do this in python by having a line os.environ["CUDA_VISIBLE_DEVICES"]="0,1" after importing os package.
Using with tf.device('/gpu:2') and creating the graph. Then it will use GPU device 2 to run.
Using config = tf.ConfigProto(device_count = {'GPU': 1}) and then sess = tf.Session(config=config). This will use GPU device 1.
TF would allocate all available memory on each visible GPU if not told otherwise. Here are 5 ways to stick to just one (or a few) GPUs.
Bash solution. Set CUDA_VISIBLE_DEVICES=0,1 in your terminal/console before starting python or jupyter notebook:
CUDA_VISIBLE_DEVICES=0,1 python script.py
Python solution. run next 2 lines of code before constructing a session
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
Automated solution. Method below will automatically detect GPU devices that are not used by other scripts and set CUDA_VISIBLE_DEVICES for you. You have to call mask_unused_gpus before constructing a session. It will filter out GPUs by current memory usage. This way you can run multiple instances of your script at once without changing your code or setting console parameters.
The function:
import subprocess as sp
import os
def mask_unused_gpus(leave_unmasked=1):
ACCEPTABLE_AVAILABLE_MEMORY = 1024
COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"
try:
_output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
available_gpus = [i for i, x in enumerate(memory_free_values) if x > ACCEPTABLE_AVAILABLE_MEMORY]
if len(available_gpus) < leave_unmasked: raise ValueError('Found only %d usable GPUs in the system' % len(available_gpus))
os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, available_gpus[:leave_unmasked]))
except Exception as e:
print('"nvidia-smi" is probably not installed. GPUs are not masked', e)
mask_unused_gpus(2)
Limitations: if you start multiple scripts at once it might cause a collision, because memory is not allocated immediately when you construct a session. In case it is a problem for you, you can use a randomized version as in original source code: mask_busy_gpus()
Tensorflow 2.0 suggest yet another method:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
except RuntimeError as e:
# Visible devices must be set at program startup
print(e)
Tensorflow/Keras also allows to specify gpu to be used with session config. I can recommend it only if setting environment variable is not an options (i.e. an MPI run). Because it tend to be the least reliable of all methods, especially with keras.
config = tf.ConfigProto()
config.gpu_options.visible_device_list = "0,1"
with tf.Session(config) as sess:
#or K.set_session(tf.Session(config))
I believe that you need to set CUDA_VISIBLE_DEVICES=1. Or which ever GPU you want to use. If you make only one GPU visible, you will refer to it as /gpu:0 in tensorflow regardless of what you set the environment variable to.
More info on that environment variable: https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
You can modify the GPU options settings by adding at the begining of your python script:
gpu_options = tf.GPUOptions(visible_device_list="0")
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
"0" is here the name of the GPU you want to use. You can have the list of the GPU available by typing the command nvidia-smi in the terminal prompt.
With Keras, these 2 functions allow the selection of CPU or GPU and in the case of GPU the fraction of memory that will be used.
import os
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
def set_cpu_option():
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUDA_VISIBLE_DEVICES"] = ""
def set_gpu_option(which_gpu, fraction_memory):
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = fraction_memory
config.gpu_options.visible_device_list = which_gpu
set_session(tf.Session(config=config))
return
set_gpu_option("0", 0.9)
# or
set_cpu_option()
def set_specific_gpu(ID):
gpus_all_physical_list = tf.config.list_physical_devices(device_type='GPU')
tf.config.set_visible_devices(gpus_all_physical_list[ID], 'GPU')
refer to https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices
import tensorflow as tf
gpu_number = 2 #### GPU number
gpus = tf.config.list_physical_devices('GPU')
if gpus:
tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
The most elegant and clean way I have seen this work for me on my multi-core gpu setup is:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
tf_device='/gpu:0'
This assigns the task to gpu device 1.
Similarly, doing something on the lines:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="2"
tf_device='/gpu:0'
The os.environ command can be seen as a way of making only that GPU device exposed on which you intend to run the code. The second command just picks the first of the available devices that you specified.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="3"
The only thing which worked for me cleanly from within Processes to assign specific GPU to each process in a Pool.
TF 2.9 and above version of tensorflow has changed the APIs, so updating for the same,
gpus = tf.config.list_physical_devices('GPU')
gpu_id = 0
if gpus:
# Restrict TensorFlow to only use only one GPU based on gpu_id
try:
tf.config.set_visible_devices(gpus[gpu_id], 'GPU')
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)