Tensorflow operationseemingly not using GPU - tensorflow

I need to perform a job that averages large numbers of long vectors multiple times, and I would like this to be done on my GPU.
Monitoring nvtop and htop while running, I see that GPU (which always shows top activity when I train Keras models) is not being used at all in these operations, while CPU-use surges during these operations.
I have simulated it in the code snippet below (trying to minimize non-tf-work).
what am I doing wrong?
import tensorflow as tf
from tensorflow.math import add_n, add, scalar_mul
import numpy as np
tf.debugging.set_log_device_placement(True)
sess = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(sess)
os.environ["CUDA_VISIBLE_DEVICES"]="1"
#Make a random numpy matrix
vecs=np.random.rand(100, 300)
with sess.as_default():
with tf.device('/GPU:0'):
for _ in range(1000):
#vecs=np.random.rand(100, 300)
tf_vecs=tf.Variable(vecs, dtype=tf.float64)
tf_invlgt=tf.Variable(1/np.shape(vecs)[0],dtype=tf.float64)
vectors=tf.unstack(tf_vecs)
sum_vecs=add_n(vectors)
mean_vec=tf.Variable(scalar_mul(tf_invlgt, sum_vecs))
Thanks
Michael

I might be wrong but could it be that the cuda_visible_devices should be "0" like
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
see github comment here
If if still does not work, you can also add a small piece of code to check if tensorflow can see the gpu devices:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
This is mentioned here

Related

Cannot get reproducible results with ImageDataGenerator in keras

I am trying to get reproducible results between multiple runs of the same script in keras, but I get different ones at each iteration. My code looks like this:
import numpy as np
from numpy.random import seed
import random as rn
import os
seed_num = 1
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
os.environ['PYTHONHASHSEED'] = '1'
os.environ['TF_DETERMINISTIC_OPS'] = '1'
np.random.seed(seed_num)
rn.seed(seed_num)
import tensorflow as tf
tf.random.set_seed(seed_num)
import tensorflow.keras as ks
from tensorflow.python.keras import backend as K
...some imports...
from tensorflow.keras.preprocessing.image import ImageDataGenerator
.... data loading etc ....
generator = ImageDataGenerator(
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True)
generator.fit(X_train, seed=seed_num)
my_model.fit(generator.flow(X_train, y_train, batch_size=batch_size, shuffle=False, seed=seed_num), validation_data=(X_val, y_val), callbacks=callbacks , epochs=epochs, shuffle=False)
I identified the problem to be in ImageDataGenerator, i.e., when setting generator = ImageDataGenerator() without any augmentation the results are reproducible. I am also running on CPU and TensorFlow version is 2.4.1. What am I missing here?
Using GPU while creating augmented images can produce nondeterministic results.
To get reproducible results using ImageDataGenerator and GPU, one way is the following:
import random, os
import numpy as np
import tensorflow as tf
def set_seed(seed=0):
np.random.seed(seed)
tf.random.set_seed(seed)
random.seed(seed)
os.environ['TF_DETERMINISTIC_OPS'] = "1"
os.environ['TF_CUDNN_DETERMINISM'] = "1"
os.environ['PYTHONHASHSEED'] = str(seed)
set_seed()
Before model.fit() call again set_seed():
set_seed()
model.fit(...)
Otherwise, you can install the package tensorflow-determinism:
pip install tensorflow-determinism
If you're using Google Colab, restart your runtime or it won't probably work
The package will interact with GPU to produce deterministic results.
import random, os
import numpy as np
import tensorflow as tf
def set_seed(seed=0):
os.environ['TF_DETERMINISTIC_OPS'] = '1'
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
set_seed()
# code
Also in this case, before model.fit() call again set_seed():
set_seed()
model.fit(...)

Run TensorFlow op in graph mode in tf 2.x

I would like to benchmark some TensorFlow operations (for example between them or against PyTorch). However most of the time I will write something like:
import numpy as np
import tensorflow as tf
tf_device = '/GPU:0'
a = np.random.normal(scale=100, size=shape).astype(np.int64)
b = np.array(7).astype(np.int64)
with tf.device(tf_device):
a_tf = tf.constant(a)
b_tf = tf.constant(b)
%timeit tf.math.floormod(a_tf, b_tf)
The problem with this approach is that it does the computation in eager-mode (I think in particular that it has to perform GPU to CPU placement). Eventually, I want to use those ops in a tf.keras model and therefore would like to evaluate their performance in graph mode.
What is the preferred way to do it?
My google searches have led to nothing and I don't know how to use sessions like in tf 1.x.
What you are looking for is tf.function. Check this tutorial and this docs.
As the tutorial says, in TensorFlow 2, eager execution is turned on by default. The user interface is intuitive and flexible (running one-off operations is much easier and faster), but this can come at the expense of performance and deployability. To get performant and portable models, use tf.function to make graphs out of your programs.
Check this code:
import numpy as np
import tensorflow as tf
import timeit
tf_device = '/GPU:0'
shape = [100000]
a = np.random.normal(scale=100, size=shape).astype(np.int64)
b = np.array(7).astype(np.int64)
#tf.function
def experiment(a_tf, b_tf):
tf.math.floormod(a_tf, b_tf)
with tf.device(tf_device):
a_tf = tf.constant(a)
b_tf = tf.constant(b)
# warm up
experiment(a_tf, b_tf)
print("In graph mode:", timeit.timeit(lambda: experiment(a_tf, b_tf), number=10))
print("In eager mode:", timeit.timeit(lambda: tf.math.floormod(a_tf, b_tf), number=10))

Reproducible results with keras

How can I get reproducible results with keras? I followed these steps but I am still getting different results every time I run the Jupyter notebook. I also tried setting shuffle=False when calling model.fit().
My configuration:
conda 4.3.25
keras 2.0.6 with tensorflow backend
tensorflow-gpu 1.2.1
python 3.5
windows 10
See the answer I posted at another question. The main idea is to: first, disable the GPU. And then, seed the libraries like "numpy, random, etc". To sum up, including the code below at the beginning of your code may help solve your problem.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import numpy as np
import tensorflow as tf
import random as rn
from keras import backend as K
sd = 1
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)

How to get reproducible result when running Keras with Tensorflow backend

Every time I run LSTM network with Keras in jupyter notebook, I got a different result, and I have googled a lot, and I have tried some different solutions, but none of they are work, here are some solutions I tried:
set numpy random seed
random_seed=2017
from numpy.random import seed
seed(random_seed)
set tensorflow random seed
from tensorflow import set_random_seed
set_random_seed(random_seed)
set build-in random seed
import random
random.seed(random_seed)
set PYTHONHASHSEED
import os
os.environ['PYTHONHASHSEED'] = '0'
add PYTHONHASHSEED in jupyter notebook kernel.json
{
"language": "python",
"display_name": "Python 3",
"env": {"PYTHONHASHSEED": "0"},
"argv": [
"python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
]
}
and the version of my env is:
Keras: 2.0.6
Tensorflow: 1.2.1
CPU or GPU: CPU
and this is my code:
model = Sequential()
model.add(LSTM(16, input_shape=(time_steps,nb_features), return_sequences=True))
model.add(LSTM(16, input_shape=(time_steps,nb_features), return_sequences=False))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='linear'))
model.compile(loss='mse',optimizer='adam')
The seed is definitely missing from your model definition. A detailed documentation can be found here: https://keras.io/initializers/.
In essence your layers use random variables as their basis for their parameters. Therefore you get different outputs every time.
One example:
model.add(Dense(1, activation='linear',
kernel_initializer=keras.initializers.RandomNormal(seed=1337),
bias_initializer=keras.initializers.Constant(value=0.1))
Keras themselves have a section about getting reproduceable results in their FAQ section: (https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development). They have the following code snippet to produce reproducable results:
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/fchollet/keras/issues/2280#issuecomment-306959926
import os
os.environ['PYTHONHASHSEED'] = '0'
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
Keras + Tensorflow.
Step 1, disable GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random".
import tensorflow as tf
import numpy as np
import random as rn
sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)
from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
Make sure these two pieces of code are included at the start of your code, then the result will be reproducible.
I resolved this issue by adding os.environ['TF_DETERMINISTIC_OPS'] = '1'
Here an example:
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
#rest of the code
#TensorFlow version 2.3.1

Iterate cpu and gpu devices in Tensorflow

I am aware that Tensorflow can explicitly place computation on any devices by "/cpu0" or "/gpu0". However, this is hard-coded. Is there any way to iterate all visible devices with built-in API?
Here is what you would like to have:
import tensorflow as tf
from tensorflow.python.client import device_lib
def get_all_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]
all_devices = get_all_devices()
for device_name in all_devices:
with tf.device(device_name):
if "cpu" in device_name:
# Do something
pass
if "gpu" in device_name:
# Do something else
pass
Code is inspired from the best answer here: How to get current available GPUs in tensorflow?