Iterate cpu and gpu devices in Tensorflow - tensorflow

I am aware that Tensorflow can explicitly place computation on any devices by "/cpu0" or "/gpu0". However, this is hard-coded. Is there any way to iterate all visible devices with built-in API?

Here is what you would like to have:
import tensorflow as tf
from tensorflow.python.client import device_lib
def get_all_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]
all_devices = get_all_devices()
for device_name in all_devices:
with tf.device(device_name):
if "cpu" in device_name:
# Do something
pass
if "gpu" in device_name:
# Do something else
pass
Code is inspired from the best answer here: How to get current available GPUs in tensorflow?

Related

Tensorflow operationseemingly not using GPU

I need to perform a job that averages large numbers of long vectors multiple times, and I would like this to be done on my GPU.
Monitoring nvtop and htop while running, I see that GPU (which always shows top activity when I train Keras models) is not being used at all in these operations, while CPU-use surges during these operations.
I have simulated it in the code snippet below (trying to minimize non-tf-work).
what am I doing wrong?
import tensorflow as tf
from tensorflow.math import add_n, add, scalar_mul
import numpy as np
tf.debugging.set_log_device_placement(True)
sess = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(sess)
os.environ["CUDA_VISIBLE_DEVICES"]="1"
#Make a random numpy matrix
vecs=np.random.rand(100, 300)
with sess.as_default():
with tf.device('/GPU:0'):
for _ in range(1000):
#vecs=np.random.rand(100, 300)
tf_vecs=tf.Variable(vecs, dtype=tf.float64)
tf_invlgt=tf.Variable(1/np.shape(vecs)[0],dtype=tf.float64)
vectors=tf.unstack(tf_vecs)
sum_vecs=add_n(vectors)
mean_vec=tf.Variable(scalar_mul(tf_invlgt, sum_vecs))
Thanks
Michael
I might be wrong but could it be that the cuda_visible_devices should be "0" like
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
see github comment here
If if still does not work, you can also add a small piece of code to check if tensorflow can see the gpu devices:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
This is mentioned here

Run TensorFlow op in graph mode in tf 2.x

I would like to benchmark some TensorFlow operations (for example between them or against PyTorch). However most of the time I will write something like:
import numpy as np
import tensorflow as tf
tf_device = '/GPU:0'
a = np.random.normal(scale=100, size=shape).astype(np.int64)
b = np.array(7).astype(np.int64)
with tf.device(tf_device):
a_tf = tf.constant(a)
b_tf = tf.constant(b)
%timeit tf.math.floormod(a_tf, b_tf)
The problem with this approach is that it does the computation in eager-mode (I think in particular that it has to perform GPU to CPU placement). Eventually, I want to use those ops in a tf.keras model and therefore would like to evaluate their performance in graph mode.
What is the preferred way to do it?
My google searches have led to nothing and I don't know how to use sessions like in tf 1.x.
What you are looking for is tf.function. Check this tutorial and this docs.
As the tutorial says, in TensorFlow 2, eager execution is turned on by default. The user interface is intuitive and flexible (running one-off operations is much easier and faster), but this can come at the expense of performance and deployability. To get performant and portable models, use tf.function to make graphs out of your programs.
Check this code:
import numpy as np
import tensorflow as tf
import timeit
tf_device = '/GPU:0'
shape = [100000]
a = np.random.normal(scale=100, size=shape).astype(np.int64)
b = np.array(7).astype(np.int64)
#tf.function
def experiment(a_tf, b_tf):
tf.math.floormod(a_tf, b_tf)
with tf.device(tf_device):
a_tf = tf.constant(a)
b_tf = tf.constant(b)
# warm up
experiment(a_tf, b_tf)
print("In graph mode:", timeit.timeit(lambda: experiment(a_tf, b_tf), number=10))
print("In eager mode:", timeit.timeit(lambda: tf.math.floormod(a_tf, b_tf), number=10))

Why does logical_gpus = tf.config.experimental.list_logical_devices('GPU') lead to an empty list?

I use Tensorflow version 2.0 and will like to configure the GPU's with it.
for Tensorflow 1.x, it was done in following way
# GPU configuration
from keras.backend.tensorflow_backend import set_session
import keras
configtf = tf.compat.v1.ConfigProto()
configtf.gpu_options.allow_growth = True
configtf.gpu_options.visible_device_list = "0"
sess = tf.compat.v1.Session(config=configtf)
set_session(sess)
However, set_session is not longer available in Tensorflow 2.0, so to use access GPU's, I tried following this guide. Both the codes below lead to empty list of available GPUs, which means tensorflow is not using them.
gpus = tf.config.experimental.list_physical_devices("GPU")
gpus
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
logical_gpus
I do have Tesla K80 access available.
What will be the right way to configure tf for it to use the available GPUs? Any help will be appreciated.
What worked during this situation was to export the available GPU's into the conda environment using following command.
(your_environment) user#machine: export CUDA_VISIBLE_DEVICES=0

How to get reproducible result when running Keras with Tensorflow backend

Every time I run LSTM network with Keras in jupyter notebook, I got a different result, and I have googled a lot, and I have tried some different solutions, but none of they are work, here are some solutions I tried:
set numpy random seed
random_seed=2017
from numpy.random import seed
seed(random_seed)
set tensorflow random seed
from tensorflow import set_random_seed
set_random_seed(random_seed)
set build-in random seed
import random
random.seed(random_seed)
set PYTHONHASHSEED
import os
os.environ['PYTHONHASHSEED'] = '0'
add PYTHONHASHSEED in jupyter notebook kernel.json
{
"language": "python",
"display_name": "Python 3",
"env": {"PYTHONHASHSEED": "0"},
"argv": [
"python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
]
}
and the version of my env is:
Keras: 2.0.6
Tensorflow: 1.2.1
CPU or GPU: CPU
and this is my code:
model = Sequential()
model.add(LSTM(16, input_shape=(time_steps,nb_features), return_sequences=True))
model.add(LSTM(16, input_shape=(time_steps,nb_features), return_sequences=False))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='linear'))
model.compile(loss='mse',optimizer='adam')
The seed is definitely missing from your model definition. A detailed documentation can be found here: https://keras.io/initializers/.
In essence your layers use random variables as their basis for their parameters. Therefore you get different outputs every time.
One example:
model.add(Dense(1, activation='linear',
kernel_initializer=keras.initializers.RandomNormal(seed=1337),
bias_initializer=keras.initializers.Constant(value=0.1))
Keras themselves have a section about getting reproduceable results in their FAQ section: (https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development). They have the following code snippet to produce reproducable results:
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/fchollet/keras/issues/2280#issuecomment-306959926
import os
os.environ['PYTHONHASHSEED'] = '0'
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
Keras + Tensorflow.
Step 1, disable GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random".
import tensorflow as tf
import numpy as np
import random as rn
sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
Make sure these two pieces of code are included at the start of your code, then the result will be reproducible.
I resolved this issue by adding os.environ['TF_DETERMINISTIC_OPS'] = '1'
Here an example:
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
#rest of the code
#TensorFlow version 2.3.1

How to set specific gpu in tensorflow?

I want to specify the gpu to run my process. And I set it as follows:
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant(3.0)
with tf.Session() as sess:
while True:
print sess.run(a)
However it still allocate memory in both my two gpus.
| 0 7479 C python 5437MiB
| 1 7479 C python 5437MiB
There are 3 ways to achieve this:
Using CUDA_VISIBLE_DEVICES environment variable.
by setting environment variable CUDA_VISIBLE_DEVICES="1" makes only device 1 visible and by setting CUDA_VISIBLE_DEVICES="0,1" makes devices 0 and 1 visible. You can do this in python by having a line os.environ["CUDA_VISIBLE_DEVICES"]="0,1" after importing os package.
Using with tf.device('/gpu:2') and creating the graph. Then it will use GPU device 2 to run.
Using config = tf.ConfigProto(device_count = {'GPU': 1}) and then sess = tf.Session(config=config). This will use GPU device 1.
TF would allocate all available memory on each visible GPU if not told otherwise. Here are 5 ways to stick to just one (or a few) GPUs.
Bash solution. Set CUDA_VISIBLE_DEVICES=0,1 in your terminal/console before starting python or jupyter notebook:
CUDA_VISIBLE_DEVICES=0,1 python script.py
Python solution. run next 2 lines of code before constructing a session
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
Automated solution. Method below will automatically detect GPU devices that are not used by other scripts and set CUDA_VISIBLE_DEVICES for you. You have to call mask_unused_gpus before constructing a session. It will filter out GPUs by current memory usage. This way you can run multiple instances of your script at once without changing your code or setting console parameters.
The function:
import subprocess as sp
import os
def mask_unused_gpus(leave_unmasked=1):
ACCEPTABLE_AVAILABLE_MEMORY = 1024
COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"
try:
_output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
available_gpus = [i for i, x in enumerate(memory_free_values) if x > ACCEPTABLE_AVAILABLE_MEMORY]
if len(available_gpus) < leave_unmasked: raise ValueError('Found only %d usable GPUs in the system' % len(available_gpus))
os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, available_gpus[:leave_unmasked]))
except Exception as e:
print('"nvidia-smi" is probably not installed. GPUs are not masked', e)
mask_unused_gpus(2)
Limitations: if you start multiple scripts at once it might cause a collision, because memory is not allocated immediately when you construct a session. In case it is a problem for you, you can use a randomized version as in original source code: mask_busy_gpus()
Tensorflow 2.0 suggest yet another method:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
except RuntimeError as e:
# Visible devices must be set at program startup
print(e)
Tensorflow/Keras also allows to specify gpu to be used with session config. I can recommend it only if setting environment variable is not an options (i.e. an MPI run). Because it tend to be the least reliable of all methods, especially with keras.
config = tf.ConfigProto()
config.gpu_options.visible_device_list = "0,1"
with tf.Session(config) as sess:
#or K.set_session(tf.Session(config))
I believe that you need to set CUDA_VISIBLE_DEVICES=1. Or which ever GPU you want to use. If you make only one GPU visible, you will refer to it as /gpu:0 in tensorflow regardless of what you set the environment variable to.
More info on that environment variable: https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
You can modify the GPU options settings by adding at the begining of your python script:
gpu_options = tf.GPUOptions(visible_device_list="0")
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
"0" is here the name of the GPU you want to use. You can have the list of the GPU available by typing the command nvidia-smi in the terminal prompt.
With Keras, these 2 functions allow the selection of CPU or GPU and in the case of GPU the fraction of memory that will be used.
import os
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
def set_cpu_option():
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUDA_VISIBLE_DEVICES"] = ""
def set_gpu_option(which_gpu, fraction_memory):
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = fraction_memory
config.gpu_options.visible_device_list = which_gpu
set_session(tf.Session(config=config))
return
set_gpu_option("0", 0.9)
# or
set_cpu_option()
def set_specific_gpu(ID):
gpus_all_physical_list = tf.config.list_physical_devices(device_type='GPU')
tf.config.set_visible_devices(gpus_all_physical_list[ID], 'GPU')
refer to https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices
import tensorflow as tf
gpu_number = 2 #### GPU number
gpus = tf.config.list_physical_devices('GPU')
if gpus:
tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
The most elegant and clean way I have seen this work for me on my multi-core gpu setup is:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
tf_device='/gpu:0'
This assigns the task to gpu device 1.
Similarly, doing something on the lines:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="2"
tf_device='/gpu:0'
The os.environ command can be seen as a way of making only that GPU device exposed on which you intend to run the code. The second command just picks the first of the available devices that you specified.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="3"
The only thing which worked for me cleanly from within Processes to assign specific GPU to each process in a Pool.
TF 2.9 and above version of tensorflow has changed the APIs, so updating for the same,
gpus = tf.config.list_physical_devices('GPU')
gpu_id = 0
if gpus:
# Restrict TensorFlow to only use only one GPU based on gpu_id
try:
tf.config.set_visible_devices(gpus[gpu_id], 'GPU')
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)