I am fitting a model in a for loop, but I am getting an error that my GPU's memory is full. I am using Keras in Anaconda Spyder IDE. My GPU is a Asus GTX 1060 6gb.
I have also used codes like: K.clear_session(), gc.collect(), tf.reset_default_graph(), del custom_model but none of them worked.
GPU properties say's 98% of memory is full:
Nothing flush GPU memory except numba.cuda.close() but will not allow me to use my GPU again. The only way to clear it is restarting kernel and rerun my code.
I am looking for any script code to add my code allow me to use my code in for loop and clear GPU in every loop.
Wrap up the model creation and training part in a function then use subprocess for the main work. When training is done, subprocess will be terminated and GPU memory will be free.
something like:
import multiprocessing
def create_model_and_train( ):
.....
.....
p = multiprocessing.Process(target=create_model_and_train)
p.start()
p.join()
Or you can create below function and call it before each run:
from keras.backend.tensorflow_backend import set_session
from keras.backend.tensorflow_backend import clear_session
from keras.backend.tensorflow_backend import get_session
import tensorflow
import gc
# Reset Keras Session
def reset_keras():
sess = get_session()
clear_session()
sess.close()
sess = get_session()
try:
del classifier # this is from global space - change this as you need
except:
pass
print(gc.collect()) # if it does something you should see a number as output
# use the same config as you used to create the session
config = tensorflow.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 1
config.gpu_options.visible_device_list = "0"
set_session(tensorflow.Session(config=config))
Related
I need to perform a job that averages large numbers of long vectors multiple times, and I would like this to be done on my GPU.
Monitoring nvtop and htop while running, I see that GPU (which always shows top activity when I train Keras models) is not being used at all in these operations, while CPU-use surges during these operations.
I have simulated it in the code snippet below (trying to minimize non-tf-work).
what am I doing wrong?
import tensorflow as tf
from tensorflow.math import add_n, add, scalar_mul
import numpy as np
tf.debugging.set_log_device_placement(True)
sess = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(sess)
os.environ["CUDA_VISIBLE_DEVICES"]="1"
#Make a random numpy matrix
vecs=np.random.rand(100, 300)
with sess.as_default():
with tf.device('/GPU:0'):
for _ in range(1000):
#vecs=np.random.rand(100, 300)
tf_vecs=tf.Variable(vecs, dtype=tf.float64)
tf_invlgt=tf.Variable(1/np.shape(vecs)[0],dtype=tf.float64)
vectors=tf.unstack(tf_vecs)
sum_vecs=add_n(vectors)
mean_vec=tf.Variable(scalar_mul(tf_invlgt, sum_vecs))
Thanks
Michael
I might be wrong but could it be that the cuda_visible_devices should be "0" like
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
see github comment here
If if still does not work, you can also add a small piece of code to check if tensorflow can see the gpu devices:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
This is mentioned here
I have read answers like:
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
set_session(tf.Session(config=config))
But it just doesn't work. There seems to be so much update in both keras and TF that almost anything written in 2017 doesn't work!
So, how to limit memory usage?
One way to restrict reserving all GPU RAM in tensorflow is to grow the amount of reservation. This method will allow you to train multiple NN using same GPU but you cannot set a threshold on the amount of memory you want to reserve.
Using the following snippet before importing keras or just use tf.keras instead.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
Seems that I have the same problem. Have you tried this?
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)
This method will make the application allocate only as much GPU memory based on runtime allocation.
2022 update of #Yustina Ivanova's answer:
Most people will encounter errors such as (one of the following):
AttributeError: module 'tensorflow.keras.backend' has no attribute 'tensorflow_backend'
AttributeError: module 'tensorflow.keras.backend' has no attribute 'set_session'
AttributeError: module 'tensorflow' has no attribute 'ConfigProto'
AttributeError: module 'tensorflow' has no attribute 'Session'
The reason is compatibility with tf2. Assuming a keras backend, this would be the fix:
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(sess)
I've been trying to run keras on a CPU cluster, and for this I need to limit the number of cores used (it's a shared system). So to limit the number of cores, I landed on this answer. However, this simply doesn't work. I tried running with this basic code:
from keras.applications.vgg16 import VGG16
from keras import backend as K
import numpy as np
conf = K.tf.ConfigProto(device_count={'CPU': 1},
intra_op_parallelism_threads=2,
inter_op_parallelism_threads=2)
K.set_session(K.tf.Session(config=conf))
model = VGG16(weights='imagenet', include_top=False)
x = np.random.randn(1000, 224, 224, 3)
features = model.predict(x)
When I run this and check htop, it uses all (128) logical cores. Is this a bug in keras? Or am I doing something wrong?
Keras says that my CPU supports SSE4.1 and SSE4.2, which are not used because I didn't compile from binary. Will compiling from binary also fix the original question?
EDIT: I've found a workaround when launching the keras script from a unix machine:
taskset -c 0-23 python keras_script.py
This will run the script on the first 24 cores of the machine. It works, but it would still be nice if this was available from within keras/tensorflow.
I found this snippet of code that works for me, hope it helps:
from keras import backend as K
import tensorflow as tf
jobs = 2 # it means number of cores
config = tf.ConfigProto(intra_op_parallelism_threads=jobs,
inter_op_parallelism_threads=jobs,
allow_soft_placement=True,
device_count={'CPU': jobs})
session = tf.Session(config=config)
K.set_session(session)
I've got few files with different files:
main.py
watch.py
read.py
detect.py <-- Uses tensorflow based library
darkflow that relies on the graph mode
translate.py <-- uses tf
eager execution
During darkflow's TFNet initialization I get this error:
Traceback (most recent call last):
File "/home/justin/Projects/comp3931/main.py", line 6, in <module>
watcher = Watcher('res/vid/planet_earth_s01e01/video.mp4', 'res/vid/planet_earth_s01e01/english.srt')
File "/home/justin/Projects/comp3931/watch.py", line 9, in __init__
self.detector = Detector()
File "/home/justin/Projects/comp3931/detect.py", line 6, in __init__
self.tfnet = TFNet(self.options)
File "/usr/local/lib64/python3.6/site-packages/darkflow/net/build.py", line 75, in __init__
self.build_forward()
File "/usr/local/lib64/python3.6/site-packages/darkflow/net/build.py", line 105, in build_forward
self.inp = tf.placeholder(tf.float32, inp_size, 'input')
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1677, in placeholder
raise RuntimeError("tf.placeholder() is not compatible with "
RuntimeError: tf.placeholder() is not compatible with eager execution.
So, I assume that when I instantiate Translator class from translate.py file it invokes eager execution on the whole program, which then is not compatible with calls to darkflow's TFNet class used in Dectector class from detect.py
If I run translate.py independently from others it works fine, other modules also work fine if run them without translate.py involved.
I guess the fact that they use different contexts (graph/eager), the whole thing can't run together in the same program. I've tried looking at the documentation, but could not find a way to switch back to graph mode when needed.
Is there any way I can run both eager and graph modes in the same application in different places?
It is best to write code that's compatible with both graph mode and eager execution. From the documentation:
Use tf.data for input processing instead of queues. It's faster and easier.
Use object-oriented layer APIs—like tf.keras.layers and tf.keras.Model—since they have explicit storage for variables.
Most model code works the same during eager and graph execution, but there are exceptions. (For example, dynamic models using Python
control flow to change the computation based on inputs.)
Once eager execution is enabled with tf.enable_eager_execution, it cannot be turned off. Start a new Python session to return to graph
execution.
That said, it is possible to use eager execution while in graph mode by using tfe.py_func(). Here is the code example from the documentation (I just added the imports and asserts):
import tensorflow as tf
import tensorflow.contrib.eager as tfe
def my_py_func(x):
assert tf.executing_eagerly()
x = tf.matmul(x, x) # You can use tf ops
print(x) # but it's eager!
return x
assert not tf.executing_eagerly()
with tf.Session() as sess:
x = tf.placeholder(dtype=tf.float32)
# Call eager function in graph!
pf = tfe.py_func(my_py_func, [x], tf.float32)
sess.run(pf, feed_dict={x: [[2.0]]}) # [[4.0]]
The reverse is also possible, as Alex Passos explains in this video. Here is an example inspired by the one in the video:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()
def my_graph_func(x):
assert not tf.executing_eagerly()
w = tfe.Variable(2.0)
b = tfe.Variable(4.0)
return x * w + b
assert tf.executing_eagerly()
g = tfe.make_template("g", my_graph_func, create_graph_function_=True)
print(g(3))
There's also an unofficial way to switch modes, using the eager_mode and graph_mode contexts defined in tensorflow.python.eager.context like this:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
from tensorflow.python.eager.context import eager_mode, graph_mode
with eager_mode():
print("Eager mode")
assert tf.executing_eagerly()
x1 = tfe.Variable(5.0)
print(x1.numpy())
print()
with graph_mode():
print("Graph mode")
assert not tf.executing_eagerly()
x2 = tfe.Variable(5.0)
with tf.Session():
x2.initializer.run()
print(x2.eval())
As it is not official, you should probably avoid it in production code, but it may come in handy when debugging, or in a Jupyter notebook. One last option is to use this switch_to() function:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
from tensorflow.python.eager.context import context, EAGER_MODE, GRAPH_MODE
def switch_to(mode):
ctx = context()._eager_context
ctx.mode = mode
ctx.is_eager = mode == EAGER_MODE
switch_to(EAGER_MODE)
assert tf.executing_eagerly()
v = tfe.Variable(3.0)
print(v.numpy())
assert tf.get_default_graph().get_operations() == []
switch_to(GRAPH_MODE)
assert not tf.executing_eagerly()
v = tfe.Variable(3.0)
init = tf.global_variables_initializer()
assert len(tf.get_default_graph().get_operations()) > 0
with tf.Session():
init.run()
print(v.eval())
It is really a hack, but it may be useful in a Jupyter notebook, if you don't like nesting all your code in with blocks.
https://www.tensorflow.org/programmers_guide/eager
(scroll down to "Use eager execution in a graph environment").
Maybe that helps...
I want to specify the gpu to run my process. And I set it as follows:
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant(3.0)
with tf.Session() as sess:
while True:
print sess.run(a)
However it still allocate memory in both my two gpus.
| 0 7479 C python 5437MiB
| 1 7479 C python 5437MiB
There are 3 ways to achieve this:
Using CUDA_VISIBLE_DEVICES environment variable.
by setting environment variable CUDA_VISIBLE_DEVICES="1" makes only device 1 visible and by setting CUDA_VISIBLE_DEVICES="0,1" makes devices 0 and 1 visible. You can do this in python by having a line os.environ["CUDA_VISIBLE_DEVICES"]="0,1" after importing os package.
Using with tf.device('/gpu:2') and creating the graph. Then it will use GPU device 2 to run.
Using config = tf.ConfigProto(device_count = {'GPU': 1}) and then sess = tf.Session(config=config). This will use GPU device 1.
TF would allocate all available memory on each visible GPU if not told otherwise. Here are 5 ways to stick to just one (or a few) GPUs.
Bash solution. Set CUDA_VISIBLE_DEVICES=0,1 in your terminal/console before starting python or jupyter notebook:
CUDA_VISIBLE_DEVICES=0,1 python script.py
Python solution. run next 2 lines of code before constructing a session
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
Automated solution. Method below will automatically detect GPU devices that are not used by other scripts and set CUDA_VISIBLE_DEVICES for you. You have to call mask_unused_gpus before constructing a session. It will filter out GPUs by current memory usage. This way you can run multiple instances of your script at once without changing your code or setting console parameters.
The function:
import subprocess as sp
import os
def mask_unused_gpus(leave_unmasked=1):
ACCEPTABLE_AVAILABLE_MEMORY = 1024
COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"
try:
_output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
available_gpus = [i for i, x in enumerate(memory_free_values) if x > ACCEPTABLE_AVAILABLE_MEMORY]
if len(available_gpus) < leave_unmasked: raise ValueError('Found only %d usable GPUs in the system' % len(available_gpus))
os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, available_gpus[:leave_unmasked]))
except Exception as e:
print('"nvidia-smi" is probably not installed. GPUs are not masked', e)
mask_unused_gpus(2)
Limitations: if you start multiple scripts at once it might cause a collision, because memory is not allocated immediately when you construct a session. In case it is a problem for you, you can use a randomized version as in original source code: mask_busy_gpus()
Tensorflow 2.0 suggest yet another method:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
except RuntimeError as e:
# Visible devices must be set at program startup
print(e)
Tensorflow/Keras also allows to specify gpu to be used with session config. I can recommend it only if setting environment variable is not an options (i.e. an MPI run). Because it tend to be the least reliable of all methods, especially with keras.
config = tf.ConfigProto()
config.gpu_options.visible_device_list = "0,1"
with tf.Session(config) as sess:
#or K.set_session(tf.Session(config))
I believe that you need to set CUDA_VISIBLE_DEVICES=1. Or which ever GPU you want to use. If you make only one GPU visible, you will refer to it as /gpu:0 in tensorflow regardless of what you set the environment variable to.
More info on that environment variable: https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
You can modify the GPU options settings by adding at the begining of your python script:
gpu_options = tf.GPUOptions(visible_device_list="0")
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
"0" is here the name of the GPU you want to use. You can have the list of the GPU available by typing the command nvidia-smi in the terminal prompt.
With Keras, these 2 functions allow the selection of CPU or GPU and in the case of GPU the fraction of memory that will be used.
import os
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
def set_cpu_option():
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUDA_VISIBLE_DEVICES"] = ""
def set_gpu_option(which_gpu, fraction_memory):
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = fraction_memory
config.gpu_options.visible_device_list = which_gpu
set_session(tf.Session(config=config))
return
set_gpu_option("0", 0.9)
# or
set_cpu_option()
def set_specific_gpu(ID):
gpus_all_physical_list = tf.config.list_physical_devices(device_type='GPU')
tf.config.set_visible_devices(gpus_all_physical_list[ID], 'GPU')
refer to https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices
import tensorflow as tf
gpu_number = 2 #### GPU number
gpus = tf.config.list_physical_devices('GPU')
if gpus:
tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
The most elegant and clean way I have seen this work for me on my multi-core gpu setup is:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
tf_device='/gpu:0'
This assigns the task to gpu device 1.
Similarly, doing something on the lines:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="2"
tf_device='/gpu:0'
The os.environ command can be seen as a way of making only that GPU device exposed on which you intend to run the code. The second command just picks the first of the available devices that you specified.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="3"
The only thing which worked for me cleanly from within Processes to assign specific GPU to each process in a Pool.
TF 2.9 and above version of tensorflow has changed the APIs, so updating for the same,
gpus = tf.config.list_physical_devices('GPU')
gpu_id = 0
if gpus:
# Restrict TensorFlow to only use only one GPU based on gpu_id
try:
tf.config.set_visible_devices(gpus[gpu_id], 'GPU')
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)