How to prevent TPUEstimator from using GPU or TPU - tensorflow

I need to force TPUEstimator to use the CPU. I have a rented google machine and the GPU is already running training. Since the CPUs are idle, I want to start a second Tensorflow session for evaluation but I want to force the evaluation cycle to use CPUs only so that it does not steal GPU time.
I am assuming there is a flag in the run_config or similar for doing this but am struggling to find one in the TF documentation.
run_config = tf.contrib.tpu.RunConfig(
cluster=tpu_cluster_resolver,
master=FLAGS.master,
model_dir=FLAGS.output_dir,
save_checkpoints_steps=FLAGS.save_checkpoints_steps,
tpu_config=tf.contrib.tpu.TPUConfig(
iterations_per_loop=FLAGS.iterations_per_loop,
num_shards=FLAGS.num_tpu_cores,
per_host_input_for_training=is_per_host))

You can run a TPUEstimator locally by including two arguments: (1) use_tpu should be set to False, and (2) tf.contrib.tpu.RunConfig should be passed as the config argument.
my_tpu_estimator = tf.contrib.tpu.TPUEstimator(
model_fn=my_model_fn,
config=tf.contrib.tpu.RunConfig()
use_tpu=False)
The majority of example TPU models can be run in local mode by setting the command line flags:
$> python mnist_tpu.py --use_tpu=false --master=''
More documentation can be found here.

Related

Running Tensorflow model inference script on multiple GPU

I'm trying to run the model scoring (inference graph) from tensorflow objec detection API to run it on multiple GPU's, tried specifying the GPU number in the main, but it runs only on single GPU.placed GPU utilization snapshot here
Using tensorflow-gpu==1.13.1, can you kindly point me what I'm missing here.
for i in range(2):
with tf.device('/gpu:{}' . format(i)):
tf_init()
init = tf.global_variables_initializer
with detection_graph.as_default():
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
call to #run_inference_multiple_images function
The responses to this question should give you a few options for fixing this.
Usually TensorFlow will occupy all visible GPUs unless told otherwise. So if you haven't already tried, you could just remove the with tf.device line (assuming you only have the two GPUs) and TensorFlow should use them both.
Otherwise, I think the easiest is setting the environment variables with os.environ["CUDA_VISIBLE_DEVICES"] = "0,1".

Tensorflow OOM error when allocating resources to GPU [duplicate]

I work in an environment in which computational resources are shared, i.e., we have a few server machines equipped with a few Nvidia Titan X GPUs each.
For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the GPU, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU.
The problem with TensorFlow is that, by default, it allocates the full amount of available GPU memory when it is launched. Even for a small two-layer neural network, I see that all 12 GB of the GPU memory is used up.
Is there a way to make TensorFlow only allocate, say, 4 GB of GPU memory, if one knows that this is enough for a given model?
You can set the fraction of GPU memory to be allocated when you construct a tf.Session by passing a tf.GPUOptions as part of the optional config argument:
# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
The per_process_gpu_memory_fraction acts as a hard upper bound on the amount of GPU memory that will be used by the process on each GPU on the same machine. Currently, this fraction is applied uniformly to all of the GPUs on the same machine; there is no way to set this on a per-GPU basis.
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
https://github.com/tensorflow/tensorflow/issues/1578
For TensorFlow 2.0 and 2.1 (docs):
import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)
For TensorFlow 2.2+ (docs):
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
The docs also list some more methods:
Set environment variable TF_FORCE_GPU_ALLOW_GROWTH to true.
Use tf.config.experimental.set_virtual_device_configuration to set a hard limit on a Virtual GPU device.
Here is an excerpt from the Book Deep Learning with TensorFlow
In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as it is needed by the process. TensorFlow provides two configuration options on the session to control this. The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations, it starts out allocating very little memory, and as sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.
1) Allow growth: (more flexible)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
The second method is per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. Note: No release of memory needed, it can even worsen memory fragmentation when done.
2) Allocate fixed memory:
To only allocate 40% of the total memory of each GPU by:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
Note:
That's only useful though if you truly want to bind the amount of GPU memory available on the TensorFlow process.
For Tensorflow version 2.0 and 2.1 use the following snippet:
import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu_devices[0], True)
For prior versions , following snippet used to work for me:
import tensorflow as tf
tf_config=tf.ConfigProto()
tf_config.gpu_options.allow_growth=True
sess = tf.Session(config=tf_config)
All the answers above assume execution with a sess.run() call, which is becoming the exception rather than the rule in recent versions of TensorFlow.
When using the tf.Estimator framework (TensorFlow 1.4 and above) the way to pass the fraction along to the implicitly created MonitoredTrainingSession is,
opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
trainingConfig = tf.estimator.RunConfig(session_config=conf, ...)
tf.estimator.Estimator(model_fn=...,
config=trainingConfig)
Similarly in Eager mode (TensorFlow 1.5 and above),
opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
tfe.enable_eager_execution(config=conf)
Edit: 11-04-2018
As an example, if you are to use tf.contrib.gan.train, then you can use something similar to bellow:
tf.contrib.gan.gan_train(........, config=conf)
You can use
TF_FORCE_GPU_ALLOW_GROWTH=true
in your environment variables.
In tensorflow code:
bool GPUBFCAllocator::GetAllowGrowthValue(const GPUOptions& gpu_options) {
const char* force_allow_growth_string =
std::getenv("TF_FORCE_GPU_ALLOW_GROWTH");
if (force_allow_growth_string == nullptr) {
return gpu_options.allow_growth();
}
Tensorflow 2.0 Beta and (probably) beyond
The API changed again. It can be now found in:
tf.config.experimental.set_memory_growth(
device,
enable
)
Aliases:
tf.compat.v1.config.experimental.set_memory_growth
tf.compat.v2.config.experimental.set_memory_growth
References:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/config/experimental/set_memory_growth
https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth
See also:
Tensorflow - Use a GPU: https://www.tensorflow.org/guide/gpu
for Tensorflow 2.0 Alpha see: this answer
All the answers above refer to either setting the memory to a certain extent in TensorFlow 1.X versions or to allow memory growth in TensorFlow 2.X.
The method tf.config.experimental.set_memory_growth indeed works for allowing dynamic growth during the allocation/preprocessing. Nevertheless one may like to allocate from the start a specific-upper limit GPU memory.
The logic behind allocating a specific GPU memory would also be to prevent OOM memory during training sessions. For example, if one trains while opening video-memory consuming Chrome tabs/any other video consumption process, the tf.config.experimental.set_memory_growth(gpu, True) could result in OOM errors thrown, hence the necessity of allocating from the start more memory in certain cases.
The recommended and correct way in which to allot memory per GPU in TensorFlow 2.X is done in the following manner:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]
Shameless plug: If you install the GPU supported Tensorflow, the session will first allocate all GPUs whether you set it to use only CPU or GPU. I may add my tip that even you set the graph to use CPU only you should set the same configuration(as answered above:) ) to prevent the unwanted GPU occupation.
And in an interactive interface like IPython and Jupyter, you should also set that configure, otherwise, it will allocate all memory and leave almost none for others. This is sometimes hard to notice.
If you're using Tensorflow 2 try the following:
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
For Tensorflow 2.0 this this solution worked for me. (TF-GPU 2.0, Windows 10, GeForce RTX 2070)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)
# allocate 60% of GPU memory
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
set_session(tf.Session(config=config))
this code has worked for me:
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)
Well I am new to tensorflow, I have Geforce 740m or something GPU with 2GB ram, I was running mnist handwritten kind of example for a native language with training data containing of 38700 images and 4300 testing images and was trying to get precision , recall , F1 using following code as sklearn was not giving me precise reults. once i added this to my existing code i started getting GPU errors.
TP = tf.count_nonzero(predicted * actual)
TN = tf.count_nonzero((predicted - 1) * (actual - 1))
FP = tf.count_nonzero(predicted * (actual - 1))
FN = tf.count_nonzero((predicted - 1) * actual)
prec = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * prec * recall / (prec + recall)
plus my model was heavy i guess, i was getting memory error after 147, 148 epochs, and then I thought why not create functions for the tasks so I dont know if it works this way in tensrorflow, but I thought if a local variable is used and when out of scope it may release memory and i defined the above elements for training and testing in modules, I was able to achieve 10000 epochs without any issues, I hope this will help..
i tried to train unet on voc data set but because of huge image size, memory finishes. i tried all the above tips, even tried with batch size==1, yet to no improvement. sometimes TensorFlow version also causes the memory issues. try by using
pip install tensorflow-gpu==1.8.0

how can i train two tensorflow scripts on single gpu parallelly?

I am getting error when i run 2 tensorflow scripts on single GPU.
I have tried growth and GPU memory allocations steps, still the first script executes without problem while 2nd script failes with ResourceExhaustedError , Graph session creation error.
Kindly help.
Run each program separately first for a few iterations and check nvidia-smi dmon to see how much memory that program actually requires. Then set config.gpu_options.per_process_gpu_memory_fraction = ... in your session configuration based on the memory information you learned from nvidia-smi dmon. If the memory required for both is greater than what you have available then you will run into this resources exhausted error.
You should do the following:
# don't allow cases where a single script takes up all VRAM
# this way we can try to run several scripts at the same time
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
...
If any of you know how to train two graphs in parallel in a single script, please let me know.

Tensorflow processing performance with multiple gpu

friends!
I have a question about processing with multiple gpu.
I'm using 4 gpus and tried simple A^n + B^n example in 3 way like below.
Single GPU
with tf.device('/gpu:0'):
....tf.matpow codes...
Multiple GPU
with tf.device('/gpu:0'):
....tf.matpow codes...
with tf.device('/gpu:1'):
....tf.matpow codes...
No specific gpu designated (I think maybe all of gpu used)
....just tf.matpow codes...
when tried this, the result was incomprehensible.
the result was
1. single gpu : 6.x seconds
2. multiple gpu(2 gpus) : 2.x seconds
3. no specific gpu designated(maybe 4 gpus) : 4.x seconds
I cannot understand why #2 is faster than #3.
Anyone can help me?
Thanks.
While the Tensorflow scheduler works well for single GPUs, it is not as good at optimizing the placement of computations on multiple GPUs yet. (Although it is being worked on presently.) Without further details, it's hard to know exactly what's going on. To get a get a better picture, you can log where the computations are actually being placed by the scheduler. You can do this by setting the log_device_placement flag on when creating the tf.Session:
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
In the third code sample (where no GPU was designated) Tensorflow didn't use all of your GPUs. By default if Tensorflow can find a GPU ("/gpu:0") to use it assigns as many calculations as possible to that GPU. You would need to tell it specifically that you want want it to use all 4 like you did in the second code sample.
From the Tensorflow documentation:
If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default. If you would like to run on a different GPU, you will need to specify the preference explicitly:
with tf.device('/gpu:2'):
tf code here

Tensorflow, restore variables in a specific device

Maybe my question is a bit naive, but I really didn't find anything in the tensorflow documentation.
I have a trained tensorflow model where the variables of it was placed in the GPU. Now I would like to restore this model and test it using the CPU.
If I do this via 'tf.train.Saver.restore` as in the example:
saver = tf.train.import_meta_graph("/tmp/graph.meta")
saver.restore(session, "/tmp/model.ckp")
I have the following excpetion:
InvalidArgumentError: Cannot assign a device to node 'b_fc8/b_fc8/Adam_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
How can I make restore these variables in the CPU?
Thanks
Use clear_devices flag, ie
saver = tf.train.import_meta_graph("/tmp/graph.meta", clear_devices=True)
I'm using tensorflow 0.12 and clear_devices=True and tf.device('/cpu:0') was not working with me (saver.restore was still trying to assign variables to /gpu:0).
I really needed to force everything to /cpu:0 since I was loading several models which wouldn't fit in GPU memory anyways. Here are two alternatives to force everything to /cpu:0
Set os.environ['CUDA_VISIBLE_DEVICES']=''
Use the device_count of ConfigProto like tf.Session(config=tf.ConfigProto(device_count={"GPU": 0, "CPU": 1}))