Measuring GPU memory usage with TensorFlow profiler

Measuring GPU memory usage with TensorFlow profiler - tensorflow

Is there a way to properly measure GPU usage of tf.Estimator model using TensorFlow profiler? I've followed the documentation:
g = tf.Graph()
sess = tf.Session(graph=g)
run_meta = tf.RunMetadata()
time_and_memory_args = tf.profiler.ProfileOptionBuilder.time_and_memory()
with g.as_default():
data_shape = [BATCH_SIZE] + [3, 224, 224]
in_plh = tf.placeholder(tf.float32, data_shape)
model = some_model(in_plh, args=model_args, training=True)
images = np.random.rand(BATCH_SIZE, 3, 224, 224)
sess.run(tf.global_variables_initializer())
sess.run(model, feed_dict={in_plh: images})
time_and_memory = tf.profiler.profile(g, run_meta=run_meta, cmd='op',
options=time_and_memory_args)
if time_and_memory is not None:
print('Total requested bytes:', time_and_memory.total_requested_bytes)
But the printed result is always 0.

Related

Tensorflow v1 The name 'features:0' refers to a Tensor which does not exist

I am using TF 1.15, and define a graph
g = tf.Graph()
with g.as_default():
# Graph Inputs
features = tf.placeholder(dtype=tf.float32,
shape=[None, 2], name='features')
targets = tf.placeholder(dtype=tf.float32,
shape=[None, 1], name='targets')
# Model Parameters
weights = tf.Variable(tf.zeros(shape=[2, 1],
dtype=tf.float32), name='weights')
bias = tf.Variable([[0.]], dtype=tf.float32, name='bias')
# Forward Pass
linear = tf.add(tf.matmul(features, weights), bias, name='linear')
ones = tf.ones(shape=tf.shape(linear))
zeros = tf.zeros(shape=tf.shape(linear))
prediction = tf.where(condition=tf.less(linear, 0.),
x=zeros,
y=ones,
name='prediction')
# Backward Pass
errors = targets - prediction
weight_update = tf.assign_add(weights,
tf.reshape(errors * features, (2, 1)),
name='weight_update')
bias_update = tf.assign_add(bias, errors,
name='bias_update')
train = tf.group(weight_update, bias_update, name='train')
saver = tf.train.Saver(name='saver')
and save it using
inputs = dict([(features.name, features)])
outputs = dict([(prediction.name, prediction)])
tf.saved_model.simple_save(sess, "my_path", inputs, outputs)
and I can use saved_model_cli to see the model, following is part of it
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['features:0'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: features:0
but when I use TF2 tf.keras.model.load_model("my_path") it raise error KeyError: "The name 'features:0' refers to a Tensor which does not exist. The operation, 'features', does not exist in the graph.", using java api raise the similar error.
How to solve this?
Following is the code to train the model
import numpy as np
x_train, y_train = np.array([[1,2],[3,4],[1,3]]), np.array([1,2,1])
x_train.shape, y_train.shape
with tf.Session(graph=g) as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(5):
for example, target in zip(x_train, y_train):
feed_dict = {'features:0': example.reshape(-1, 2),
'targets:0': target.reshape(-1, 1)}
_ = sess.run(['train'], feed_dict=feed_dict)
w, b = sess.run(['weights:0', 'bias:0'])
print('Model parameters:\n')
print('Weights:\n', w)
print('Bias:', b)
saver.save(sess, save_path='perceptron')
pred = sess.run('prediction:0', feed_dict={features: x_train})
print(pred, sess)
and using
for op in g.get_operations():
print(op.name)
could see features is printed.

After I put the save code into with Session... issue solved.
g = tf.Graph()
with g.as_default():
# Graph Inputs
features = tf.placeholder(dtype=tf.float32,
shape=[None, 2], name='features')
targets = tf.placeholder(dtype=tf.float32,
shape=[None, 1], name='targets')
# Model Parameters
weights = tf.Variable(tf.zeros(shape=[2, 1],
dtype=tf.float32), name='weights')
bias = tf.Variable([[0.]], dtype=tf.float32, name='bias')
# Forward Pass
linear = tf.add(tf.matmul(features, weights), bias, name='linear')
ones = tf.ones(shape=tf.shape(linear))
zeros = tf.zeros(shape=tf.shape(linear))
prediction = tf.where(condition=tf.less(linear, 0.),
x=zeros,
y=ones,
name='prediction')
# Backward Pass
errors = targets - prediction
weight_update = tf.assign_add(weights,
tf.reshape(errors * features, (2, 1)),
name='weight_update')
bias_update = tf.assign_add(bias, errors,
name='bias_update')
train = tf.group(weight_update, bias_update, name='train')
saver = tf.train.Saver(name='saver')
inputs = dict([(features.name, features)])
outputs = dict([(prediction.name, prediction)])
tf.saved_model.simple_save(sess, "my_path", inputs, outputs)

Combine tf.keras.layers with Tensorflow low level API

Can i combine tf.keras.layers with low level tensorflow?
the code is not correct but i want to do something like that:create placeholders that later will be fed with data (in tf.Session()) and to feed that data to my model
X, Y = create_placeholders(n_x, n_y)
output = create_model('channels_last')(X)
cost = compute_cost(output, Y)

Yes, it is the same as using tf.layers.dense(). Using tf.keras.layers.Dense() is actually a preferred way in newest tensorflow version 1.13 (tf.layers.dense() is deprectated). For example
import tensorflow as tf
import numpy as np
x_train = np.array([[-1.551, -1.469], [1.022, 1.664]], dtype=np.float32)
y_train = np.array([1, 0], dtype=int)
x = tf.placeholder(tf.float32, shape=[None, 2])
y = tf.placeholder(tf.int32, shape=[None])
with tf.name_scope('network'):
layer1 = tf.keras.layers.Dense(2, input_shape=(2, ))
layer2 = tf.keras.layers.Dense(2, input_shape=(2, ))
fc1 = layer1(x)
logits = layer2(fc1)
with tf.name_scope('loss'):
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss_fn = tf.reduce_mean(xentropy)
with tf.name_scope('optimizer'):
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss_fn)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
loss_val = sess.run(loss_fn, feed_dict={x:x_train, y:y_train})
_ = sess.run(train_op, feed_dict={x:x_train, y:y_train})

Create predictions with custom CNN using tfrecord input

My aim is to classify images into ten categories. I have a tfrecord file as input. You can download it here (30 MB). My modified the code according to the answer:
import tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np
def my_cnn(images, num_classes, is_training): # is_training is not used...
with slim.arg_scope([slim.max_pool2d], kernel_size=[3, 3], stride=2):
net = slim.conv2d(images, 64, [5, 5])
net = slim.max_pool2d(net)
net = slim.conv2d(net, 64, [5, 5])
net = slim.max_pool2d(net)
net = slim.flatten(net)
net = slim.fully_connected(net, 192)
net = slim.fully_connected(net, num_classes, activation_fn=None)
return net
data_path = 'train-some.tfrecords'
with tf.Graph().as_default():
batch_size, height, width, channels = 10, 224, 224, 3
feature = {'train/image': tf.FixedLenFeature([], tf.string),
'train/label': tf.FixedLenFeature([], tf.int64)}
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example, features=feature)
image = tf.decode_raw(features['train/image'], tf.float32)
label = tf.cast(features['train/label'], tf.int32)
image = tf.reshape(image, [224, 224, 3])
images, labels = tf.train.shuffle_batch([image, label], batch_size, capacity=30, num_threads=1, min_after_dequeue=10)
num_classes = 10
logits = my_cnn(images, num_classes, is_training=True)
probabilities = tf.nn.softmax(logits)
with tf.Session() as sess:
init_op = [tf.global_variables_initializer(), tf.local_variables_initializer()]
# Run the init_op, evaluate the model outputs and print the results:
sess.run(init_op)
#probabilities = sess.run(probabilities)
# Create a coordinator, launch the queue runner threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
while True:
prob = sess.run(probabilities)
print('Probabilities Shape:')
print(prob.shape)
except tf.errors.OutOfRangeError:
# When done, ask the threads to stop.
print('Done training -- epoch limit reached')
finally:
coord.request_stop()
# Wait for threads to finish.
coord.join(threads)
# Save the model
saver = tf.train.Saver()
saver.save(sess, './slim_model/custom_model')
Unfortunately, I still have error messages:
ValueError: Tensor Tensor("Softmax:0", shape=(10, 10), dtype=float32) is not an element of this graph.
ValueError: Fetch argument cannot be interpreted as a Tensor. (Tensor Tensor("Softmax:0", shape=(10, 10), dtype=float32) is not an element of this graph.)

The issue is with your training. You need to start the queues using tf.train.start_queue_runners that will run a few threads to process and enqueue examples. Create a Coordinator and ask the queue runner to start its threads with the coordinator.
Check the code changes:
with tf.Session() as sess:
init_op = [tf.global_variables_initializer(), tf.local_variables_initializer()]
# Run the init_op, evaluate the model outputs and print the results:
sess.run(init_op)
#probabilities = sess.run(probabilities)
# Create a coordinator, launch the queue runner threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
while True:
prob = sess.run(probabilities)
print('Probabilities Shape:')
print(prob.shape)
except tf.errors.OutOfRangeError:
# When done, ask the threads to stop.
print('Done training -- epoch limit reached')
finally:
coord.request_stop()
# Wait for threads to finish.
coord.join(threads)
# Save the model
saver = tf.train.Saver()
saver.save(sess, './slim_model/custom_model'
Output:
Probabilities Shape:
(10, 10)
Probabilities Shape:
(10, 10)
Probabilities Shape:
(10, 10)
Probabilities Shape:
(10, 10)
Probabilities Shape:
(10, 10)
Done training -- epoch limit reached
Code with the above fixes along with saving and restoring the model can be downloaded from here.

How to use the trained tensorflow network to run inference?

I am new to tensorflow and I hope you can help me.
I have built a tensorflow CNN network and trained it successfully. The training datasets are matlab arrays. Now I would like to use the trained network to run inference. I am not sure how to write the python code for inference.
During training, I saved the mode. I am not sure how to load the model in inference.
My inference data is also a matlab array, same as training data. How can I use it? During training, I used miniPatch from Tensorlayer, should I use miniPatch in inference two?
Below is my inference code: it gave a lot of errors:
print("\n\nPreparing testing data........................")
test_data = sio.loadmat('MyTest.mat')
Z0 = test_data['Real_testing1']
img_num_test = Z0.shape[0]
X_test = np.empty([img_num_test, 128, 128, 1], dtype=float)
X_test[:,:,:,0] = Z0
Y_test = np.column_stack((np.ones([img_num_test, 1], dtype=int),np.zeros([img_num_test, 1], dtype=int)))
print("\tTesting X shape: {0}".format(X_test.shape))
print("\tTesting Y shape: {0}".format(Y_test.shape))
print("\n\Restore the network ...")
save_dir = "checkpoints/";
epoch = 1000
model_name = save_dir + str(epoch) + '_model'
if not os.path.exists(save_dir):
os.makedirs(save_dir)
saver = tf.train.Saver().restore(sess, save_path=model_name)
start_time_begin = time.time()
print("\n\Running network...")
start_time = time.time()
y = model.Scribenet(X_test[0, :, :, :], False, 1.0)
y = sess.run([y], feed_dict=feed_dict)
print(y[0:9])
sess.close()
Below is my training code:
x = tf.placeholder(tf.float32, shape=[None, 128, 128, 1], name='x')
y_ = tf.placeholder(tf.int64, shape=[None, 2], name='y_')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
is_training = tf.placeholder(tf.bool, name='is_traininng')
net_in = x
net_out = model.MyCNN(net_in, is_training, keep_prob)
y = net_out
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_, name='cost'))
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
y_op = tf.argmax(tf.nn.softmax(y),1)
train_op = tf.train.AdamOptimizer(learning_rate, beta1=0.9, beta2=0.999,
epsilon=1e-08, use_locking=False).minimize(cost)
sess.run(tf.global_variables_initializer())
save_dir = "checkpoints/";
if not os.path.exists(save_dir):
os.makedirs(save_dir)
saver = tf.train.Saver()
print("\n\nStart training the network ...")
start_time_begin = time.time()
for epoch in range(n_epoch):
start_time = time.time()
loss_ep = 0; n_step = 0
for X_train_a, y_train_a in tl.iterate.minibatches(X_train, Y_train,
batch_size, shuffle=True):
feed_dict = {x: X_train_a, y_: y_train_a, is_training: True, keep_prob: train_keep_prob}
loss, _ = sess.run([cost, train_op], feed_dict=feed_dict)
loss_ep += loss
n_step += 1
loss_ep = loss_ep/ n_step
if (epoch+1) % save_freq == 0:
model_name = save_dir + str(epoch+1) + '_model'
saver.save(sess, save_path=model_name)

The main issue seems to be that there's no graph building in your inference code. You either need to save the whole graph (in SavedModel format), or build a graph in your inference code and load your variables via a training checkpoint (probably the easiest to start). As long as the variable names are the same, you can load variables saved from the training graph into the inference graph.
So inference will be your training code but without the y_ placeholder and without the loss/optimizer logic. You can feed a single image (batch size 1) to start, so no need for batching logic either.

Fully Convolutional Network, Training Error

I apologize that I'm not good at English.
I'm trying to build my own Fully Convolutional Network using TensorFlow.
But I have difficulties on training this model with my own image data, whereas the MNIST data worked properly.
Here is my FCN model code: (Not using pre-trained or pre-bulit model)
import tensorflow as tf
import numpy as np
Loading MNIST Data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
images_flatten = tf.placeholder(tf.float32, shape=[None, 784])
images = tf.reshape(images_flatten, [-1,28,28,1]) # CNN deals with 3 dimensions
labels = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32) # Dropout Ratio
Convolutional Layers
# Conv. Layer #1
W1 = tf.Variable(tf.truncated_normal([3, 3, 1, 4], stddev = 0.1))
b1 = tf.Variable(tf.truncated_normal([4], stddev = 0.1))
FMA = tf.nn.conv2d(images, W1, strides=[1,1,1,1], padding='SAME')
# FMA stands for Fused Multiply Add, which means convolution
RELU = tf.nn.relu(tf.add(FMA, b1))
POOL = tf.nn.max_pool(RELU, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
# Conv. Layer #2
W2 = tf.Variable(tf.truncated_normal([3, 3, 4, 8], stddev = 0.1))
b2 = tf.Variable(tf.truncated_normal([8], stddev = 0.1))
FMA = tf.nn.conv2d(POOL, W2, strides=[1,1,1,1], padding='SAME')
RELU = tf.nn.relu(tf.add(FMA, b2))
POOL = tf.nn.max_pool(RELU, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
# Conv. Layer #3
W3 = tf.Variable(tf.truncated_normal([7, 7, 8, 16], stddev = 0.1))
b3 = tf.Variable(tf.truncated_normal([16], stddev = 0.1))
FMA = tf.nn.conv2d(POOL, W3, strides=[1,1,1,1], padding='VALID')
RELU = tf.nn.relu(tf.add(FMA, b3))
# Dropout
Dropout = tf.nn.dropout(RELU, keep_prob)
# Conv. Layer #4
W4 = tf.Variable(tf.truncated_normal([1, 1, 16, 10], stddev = 0.1))
b4 = tf.Variable(tf.truncated_normal([10], stddev = 0.1))
FMA = tf.nn.conv2d(Dropout, W4, strides=[1,1,1,1], padding='SAME')
LAST_RELU = tf.nn.relu(tf.add(FMA, b4))
Summary: [Conv-ReLU-Pool] - [Conv-ReLU-Pool] - [Conv-ReLU] - [Dropout] - [Conv-ReLU]
Define Loss, Accuracy
prediction = tf.squeeze(LAST_RELU)
# Because FCN returns (1 x 1 x class_num) in training
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, labels))
# First arg is 'logits=' and the other one is 'labels='
optimizer = tf.train.AdamOptimizer(0.001)
train = optimizer.minimize(loss)
label_max = tf.argmax(labels, 1)
pred_max = tf.argmax(prediction, 1)
correct_pred = tf.equal(pred_max, label_max)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Training Model
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10000):
image_batch, label_batch = mnist.train.next_batch(100)
sess.run(train, feed_dict={images: image_batch, labels: label_batch, keep_prob: 0.8})
if i % 10 == 0:
tr = sess.run([loss, accuracy], feed_dict={images: image_batch, labels: label_batch, keep_prob: 1.0})
print("Step %d, Loss %g, Accuracy %g" % (i, tr[0], tr[1]))
Loss: 0.784 (Approximately)
Accuracy: 94.8% (Approximately)
The problem is that, training this model with MNIST data worked very well, but with my own data, loss is always same(0.6319), and the output layer is always 0.
There is no difference with the code, excepting for the third convolutional layer's filter size. This filter size and input size which is compressed by previous pooling layers, must have same width & height. That's why the filter size in this layer is [7,7].
What is wrong with my model?..
The only different code between two cases (MNIST, my own data) is:
Placeholder
My own data has (128 x 64 x 1) and the label is 'eyes', 'not_eyes'
images = tf.placeholder(tf.float32, [None, 128, 64, 1])
labels = tf.placeholder(tf.int32, [None, 2])
3rd Convolutional Layer
W3 = tf.Variable(tf.truncated_normal([32, 16, 8, 16], stddev = 0.1))
Feeding (Batch)
image_data, label_data = input_data.get_batch(TRAINING_FILE, 10)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for i in range(10000):
image_batch, label_batch = sess.run([image_data, label_data])
sess.run(train, feed_dict={images: image_batch, labels: label_batch, keep_prob: 0.8})
if i % 10 == 0: ... # Validation part is almost same, too...
coord.request_stop()
coord.join(threads)
Here "input_data" is an another python file in the same directory, and "get_batch(TRAINING_FILE, 10)" is the function that returns batch data. The code is:
def get_input_queue(txtfile_name):
images = []
labels = []
for line in open(txtfile_name, 'r'): # Here txt file has data's path, label, label number
cols = re.split(',|\n', line)
labels.append(int(cols[2]))
images.append(tf.image.decode_jpeg(tf.read_file(cols[0]), channels = 1))
input_queue = tf.train.slice_input_producer([images, labels], shuffle = True)
return input_queue
def get_batch(txtfile_name, batch_size):
input_queue = get_input_queue(txtfile_name)
image = input_queue[0]
label = input_queue[1]
image = tf.reshape(image, [128, 64, 1])
batch_image, batch_label = tf.train.batch([image, label], batch_size)
batch_label_one_hot = tf.one_hot(tf.to_int64(batch_label), 2, on_value=1.0, off_value=0.0)
return batch_image, batch_label_one_hot
It seems not to have any problem .... :( Please Help me..!!

Are your inputs scaled appropriately?. The jpegs are in [0-255] range and it needs to be scaled to [-1 - 1]. You can try:
image = tf.reshape(image, [128, 64, 1])
image = tf.scalar_mul((1.0/255), image)
image = tf.subtract(image, 0.5)
image = tf.multiply(image, 2.0)

What is the accuracy you are getting with your model for MNIST? It would be helpful if you post the code. Are you using the trained model to evaluate the output for your own data.
A general suggestion on setting up the convolution model is provided here.
Here is the model suggestion according to the article :-
INPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC
Having more than one layers of CONV->RELU pair before pooling improves learning complex features. Try with N=2 instead of 1.
Some other suggestions:
While you are preparing your data reduce it to smaller size than 128x64. Try same size as the MNIST data ..
image = tf.reshape(image, [28, 28, 1])
If your eye/noeye image is color, then convert it to greyscale and normalize the values to unity range. You can do this using numpy or tf, here is how using numpy
grayscale-->
img = np.dot(np.array(img, dtype='float32'), [[0.2989],[0.5870],[0.1140]])
normalize-->
mean = np.mean(img, dtype='float32')
std = np.std(img, dtype='float32', ddof=1)
if std < 1e-4: std = 1.
img = (img - mean) / std

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Measuring GPU memory usage with TensorFlow profiler - tensorflow

Related

Tensorflow v1 The name 'features:0' refers to a Tensor which does not exist

Combine tf.keras.layers with Tensorflow low level API

Create predictions with custom CNN using tfrecord input

How to use the trained tensorflow network to run inference?

Fully Convolutional Network, Training Error

Categories

Resources