Model evaluation from a checkpoint with Multi GPU

Model evaluation from a checkpoint with Multi GPU - tensorflow

I know how to train a network on a single GPU -> save a checkpoint -> later on load this checkpoint -> run benchmarks.
I can't figure how to do it when I train using multiple GPUs and using the new Data API.
Here is the 'normal' training code:
import tensorflow as tf
images_placeholder = tf.placeholder(tf.float32, shape=(None, image_size,
image_size, 1), name='input')
labels_placeholder = tf.placeholder(tf.int32, shape=(None))
embeddings = build_graph(images_placeholder)
loss = add_loss(embeddings, labels_placeholder)
embeddings = tf.identity(embeddings, 'embeddings')
Later on, when I want to benchmark:
with tf.Graph().as_default():
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
load_graph_def(model_path) # for example: d:\model.ckpt-0
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
images = benchmark_utils.load_data(paths_batch, image_size)
feed_dict = {images_placeholder: images}
predictions = sess.run(embeddings, feed_dict=feed_dict)
So now I want to train with multiple GPUs like so:
with tf.Graph().as_default(), tf.device('/cpu:0'):
dataset = tf.data.Dataset.from_tensor_slices((images_list, labels_list))
dataset = dataset.map(load_images)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(128)
dataset = dataset.repeat()
opt = tf.train.MomentumOptimizer(0.01, momentum=0.9, use_nesterov=True)
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in range(num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
image_batch, label_batch = dataset.iterator.get_next()
loss = tower_loss(scope, image_batch, label_batch)
What I can't figure out is how can I get the 'input' and 'embeddings' tensor when I want to benchmark the checkpoint.
How do I define for example the tensor called 'input' that should receive the images that should be evaluated ?
I'm guessing that somewhere in the multi-gpu code, I should define this images_placeholder like I defined in the single-gpu training.
Thanks for any advice!

Related

Are there any tools/libraries that can convert tensorflow lstm model to .mlmodel format to run in iOS app

I have a simple tensorflow model that consists of lstm layers - such as tf.contrib.rnn.LSTMBlockCell or tf.keras.layers.LSTM (I can provide the sample code also, if needed). I want to run the model on an iOS app. However, I have looked at several websites that say that presently there is no way to convert and run tensorflow model that consist LSTM layers on iOS apps.
I have tried these tools/libraries to convert the tensorflow model to .mlmodel format (or .tflite format)
1. Swift for Tensorflow
2. Tensorflow Lite for iOS
3. tfcoreml
However, these tools also does not seem to be able to convert the lstm layers model to .mlmodel format. These tools, however allow to use custom layers to be added. But I don't know how I can add LSTM custom layer.
Am I wrong in saying that there is no support to run tensorflow lstm model in iOS apps? If yes, then please guide me on how I can go ahead to include the model in iOS app. Is there any other tool/library that can be ued to convert it to .mlmodel format. If no, then are there any plans to include tensorflow support for iOS in future?
Model
import tensorflow as tf
from tensorflow.contrib import rnn
from tensorflow.contrib.rnn import *
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
#Summary parameters
logs_path = "logs/"
# Training Parameters
learning_rate = 0.001
training_steps = 1000
batch_size = 128
display_step = 200
# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])
# Define weights
weights = {
'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
'out': tf.Variable(tf.random_normal([num_classes]))
}
def RNN(x, weights, biases):
# Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
x = tf.unstack(x, timesteps, 1)
# Define a lstm cell with tensorflow
lstm_cell = rnn.LSTMBlockCell(num_hidden, forget_bias = 1.0)
#lstm_cell = tf.keras.layers.LSTMCell(num_hidden, unit_forget_bias=True)
# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out']
logits = RNN(X, weights, biases)
with tf.name_scope('Model'):
prediction = tf.nn.softmax(logits, name = "prediction_layer")
with tf.name_scope('Loss'):
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y, name = "loss_layer"), name = "reduce_mean_loss")
with tf.name_scope('SGD'):
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate, name = "Gradient_Descent")
train_op = optimizer.minimize(loss_op, name = "minimize_layer")
with tf.name_scope('Accuracy'):
# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1), name = "correct_pred_layer")
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name = "reduce_mean_acc_layer")
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
#Create a summary to monitor cost tensor
tf.summary.scalar("loss", loss_op)
#Create a summary to monitor accuracy tensor
tf.summary.scalar("accuracy", accuracy)
#Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
saver = tf.train.Saver()
save_path = ""
model_save = "model.ckpt"
# Start training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
# op to write logs to Tensorboard
summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
for step in range(1, training_steps+1):
total_batch = int(mnist.train.num_examples/batch_size)
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, timesteps, num_input))
# Run optimization op (backprop)
sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
if step % display_step == 0 or step == 1:
# Calculate batch loss and accuracy
loss, acc, summary = sess.run([loss_op, accuracy, merged_summary_op], feed_dict={X: batch_x,
Y: batch_y})
# Write logs at every iteration
summary_writer.add_summary(summary, step * total_batch)
print("Step " + str(step) + ", Minibatch Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(acc))
saver.save(sess, model_save)
tf.train.write_graph(sess.graph_def, save_path, 'save_graph.pbtxt')
#print(sess.graph.get_operations())
print("Optimization Finished!")
print("Run the command line:\n" \
"--> tensorboard --logdir=logs/ " \
"\nThen open http://0.0.0.0:6006/ into your web browser")
# Calculate accuracy for 128 mnist test images
test_len = 128
test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
test_label = mnist.test.labels[:test_len]
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))
Code to generate the frozen model
import tensorflow as tf
import numpy as np
from tensorflow.python.tools import freeze_graph
save_path = ""
model_name = "test_model_tf_keras_layers_lstm"
input_graph_path = save_path + "save_graph.pbtxt"
checkpoint_path = save_path + "model.ckpt"
input_saver_def_path = ""
input_binary = False
output_node_names = "Model/prediction_layer" #output node's name. Should match to that mentioned in the code
restore_op_name = 'save/restore_all'
filename_tensor_name = 'save/const:0'
output_frozen_graph_name = save_path + 'frozen_model' + '.pb' # name of .pb file that one would like to give
clear_devices = True
freeze_graph.freeze_graph(input_graph_path, input_saver_def_path, input_binary, checkpoint_path, output_node_names, restore_op_name, filename_tensor_name, output_frozen_graph_name, clear_devices, "")
print("Model Freezed")
Conversion Code to generate .mlmodel format file
import tfcoreml
coreml_model = tfcoreml.convert(tf_model_path = 'frozen_model_test_model_tf_keras_layers_lstm.pb',
mlmodel_path = 'test_model.mlmodel',
output_feature_names = ['Model/prediction_layer:0'],
add_custom_layers = True)
coreml_model.save("test_model.mlmodel")
Error message shown with
lstm_cell = rnn.BasicLSTMCell(num_hidden, name = "lstm_cell")
Value Error: Split op case not handled. Input shape = [1, 512], output shape = [1, 128]
Error message shown with
lstm_cell = rnn.LSTMBlockCell(num_hidden, name = "lstm_cell")
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'LSTMBlockCell' used by node rnn/lstm_cell/LSTMBlockCell (defined at /anaconda2/lib/python2.7/site-packages/tfcoreml/_tf_coreml_converter.py:153) with these attrs: [forget_bias=1, use_peephole=false, cell_clip=-1, T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
<no registered kernels>
[[node rnn/lstm_cell/LSTMBlockCell (defined at /anaconda2/lib/python2.7/site-packages/tfcoreml/_tf_coreml_converter.py:153) ]]
I expect that the frozen tensorflow model can be converted to .mlmodel format.

InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool ... [[accuracy/accuracy_var/_3191]]

I have trained a Model of Squeeze and excitation Convolutional Neural Network with MNIST DataSet. I have saved the Trained Model in a folder.
For using the trained model in production, I have restored it and tried to predict on new data but I am getting the following error message.
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must
feed a value for placeholder tensor 'Placeholder' with dtype bool
[[{{node Placeholder}}]]
[[accuracy/accuracy_var/_3191]]
Model Restore Code :
import tensorflow as tf
from cifar10 import *
init_learning_rate = 0.1
train_x, train_y, test_x, test_y = prepare_data()
train_x, test_x = color_preprocessing(train_x, test_x)
tf.reset_default_graph()
saver = tf.train.import_meta_graph("./model/ResNeXt.ckpt.meta")
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state('./model')
if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path):
saver.restore(sess, ckpt.model_checkpoint_path)
graph = tf.get_default_graph()
for n in graph.as_graph_def().node:
print(n.name)
cost = graph.get_tensor_by_name("cost_reduce_mean:0")
accuracy = graph.get_tensor_by_name("accuracy/accuracy_var:0")
test_batch_x = test_x[0: 1000]
test_batch_y = test_y[0: 1000]
x = graph.get_tensor_by_name("placeholder_x_input:0")
label = graph.get_tensor_by_name("placeholder_label_input:0")
training_flag = tf.placeholder(tf.bool, name='tr_fg_99999')
learning_rate = tf.placeholder(tf.float32, name='learning_rate_99999')
test_feed_dict = {x: test_batch_x,label: test_batch_y,learning_rate: 0.1,training_flag: False}
loss_, acc_ = sess.run([cost, accuracy], feed_dict=test_feed_dict)
total_acc = acc_
Once the line - loss_, acc_ = sess.run([cost, accuracy], feed_dict=test_feed_dict) executes, it gives the error.

How to use TensorFlow Dataset API in GANs training?

I am training GANs model. For loading the dataset, I am using Dataset API of TensorFlow.
# train_dataset has image and label. z_train dataset has noise (z).
train_dataset = tf.data.TFRecordDataset(train_file)
z_train = tf.data.Dataset.from_tensor_slices(tf.random_uniform([total_training_samples, seq_length, z_dim],
minval=0, maxval=1, dtype=tf.float32))
train_dataset = tf.data.Dataset.zip((train_dataset, z_train))
Creating Iterator:
iter = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
Using the iterator:
(img, label), z = iter.get_next()
train_init_op = iter.make_initializer(train_dataset)
While training the GAN in session:
Training Discriminator first:
_, disc_loss = sess.run([disc_optim, disc_loss])
then training Generator:
_, gen_loss = sess.run([gen_optim, gen_loss])
Here is the catch. Since, I am using label as condition (CGAN) in both, discriminator and generator graph, using two sess.run produces two different set of batch of label during the same run of batch.
for epoch in range(num_of_epochs):
sess.run([tf.global_variables_initializer(), train_init_op.initializer])
for batch in range(num_of_batches):
_, disc_loss = sess.run([disc_optim, disc_loss])
_, gen_loss = sess.run([gen_optim, gen_loss])
Since, I have to feed the same batch of label in the generator's session run as in discriminator's session run, how shall I prevent Dataset API to produce two different batches in the same loop of a batch?
Note: I am using TensorFlow v1.9
Thanks in advance.

You can create 2 iterators for the same dataset. If you need to shuffle the dataset, you can even do that by specifying the seed as a tensor. See example below.
import tensorflow as tf
seed_ts = tf.placeholder(tf.int64)
ds = tf.data.Dataset.from_tensor_slices([1,2,3,4,5]).shuffle(5, seed=seed_ts, reshuffle_each_iteration=True)
it1 = ds.make_initializable_iterator()
it2 = ds.make_initializable_iterator()
input1 = it1.get_next()
input2 = it2.get_next()
with tf.Session() as sess:
for ep in range(10):
sess.run(it1.initializer, feed_dict={seed_ts: ep})
sess.run(it2.initializer, feed_dict={seed_ts: ep})
print("Epoch" + str(ep))
for i in range(5):
x = sess.run(input1)
y = sess.run(input2)
print([x, y])

How to use the trained tensorflow network to run inference?

I am new to tensorflow and I hope you can help me.
I have built a tensorflow CNN network and trained it successfully. The training datasets are matlab arrays. Now I would like to use the trained network to run inference. I am not sure how to write the python code for inference.
During training, I saved the mode. I am not sure how to load the model in inference.
My inference data is also a matlab array, same as training data. How can I use it? During training, I used miniPatch from Tensorlayer, should I use miniPatch in inference two?
Below is my inference code: it gave a lot of errors:
print("\n\nPreparing testing data........................")
test_data = sio.loadmat('MyTest.mat')
Z0 = test_data['Real_testing1']
img_num_test = Z0.shape[0]
X_test = np.empty([img_num_test, 128, 128, 1], dtype=float)
X_test[:,:,:,0] = Z0
Y_test = np.column_stack((np.ones([img_num_test, 1], dtype=int),np.zeros([img_num_test, 1], dtype=int)))
print("\tTesting X shape: {0}".format(X_test.shape))
print("\tTesting Y shape: {0}".format(Y_test.shape))
print("\n\Restore the network ...")
save_dir = "checkpoints/";
epoch = 1000
model_name = save_dir + str(epoch) + '_model'
if not os.path.exists(save_dir):
os.makedirs(save_dir)
saver = tf.train.Saver().restore(sess, save_path=model_name)
start_time_begin = time.time()
print("\n\Running network...")
start_time = time.time()
y = model.Scribenet(X_test[0, :, :, :], False, 1.0)
y = sess.run([y], feed_dict=feed_dict)
print(y[0:9])
sess.close()
Below is my training code:
x = tf.placeholder(tf.float32, shape=[None, 128, 128, 1], name='x')
y_ = tf.placeholder(tf.int64, shape=[None, 2], name='y_')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
is_training = tf.placeholder(tf.bool, name='is_traininng')
net_in = x
net_out = model.MyCNN(net_in, is_training, keep_prob)
y = net_out
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_, name='cost'))
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
y_op = tf.argmax(tf.nn.softmax(y),1)
train_op = tf.train.AdamOptimizer(learning_rate, beta1=0.9, beta2=0.999,
epsilon=1e-08, use_locking=False).minimize(cost)
sess.run(tf.global_variables_initializer())
save_dir = "checkpoints/";
if not os.path.exists(save_dir):
os.makedirs(save_dir)
saver = tf.train.Saver()
print("\n\nStart training the network ...")
start_time_begin = time.time()
for epoch in range(n_epoch):
start_time = time.time()
loss_ep = 0; n_step = 0
for X_train_a, y_train_a in tl.iterate.minibatches(X_train, Y_train,
batch_size, shuffle=True):
feed_dict = {x: X_train_a, y_: y_train_a, is_training: True, keep_prob: train_keep_prob}
loss, _ = sess.run([cost, train_op], feed_dict=feed_dict)
loss_ep += loss
n_step += 1
loss_ep = loss_ep/ n_step
if (epoch+1) % save_freq == 0:
model_name = save_dir + str(epoch+1) + '_model'
saver.save(sess, save_path=model_name)

The main issue seems to be that there's no graph building in your inference code. You either need to save the whole graph (in SavedModel format), or build a graph in your inference code and load your variables via a training checkpoint (probably the easiest to start). As long as the variable names are the same, you can load variables saved from the training graph into the inference graph.
So inference will be your training code but without the y_ placeholder and without the loss/optimizer logic. You can feed a single image (batch size 1) to start, so no need for batching logic either.

Add a summary of accuracy of the whole train/test dataset in Tensorflow

I am trying to use Tensorboard to visualize my training procedure. My purpose is, when every epoch completed, I would like to test the network's accuracy using the whole validation dataset, and store this accuracy result into a summary file, so that I can visualize it in Tensorboard.
I know Tensorflow has summary_op to do it, however it seems only work for one batch when running the code sess.run(summary_op). I need to calculate the accuracy for the whole dataset. How?
Is there any example to do it?

Define a tf.scalar_summary that accepts a placeholder:
accuracy_value_ = tf.placeholder(tf.float32, shape=())
accuracy_summary = tf.scalar_summary('accuracy', accuracy_value_)
Then calculate the accuracy for the whole dataset (define a routine that calculates the accuracy for every batch in the dataset and extract the mean value) and save it into a python variable, let's call it va.
Once you have the value of va, just run the accuracy_summary op, feeding the accuracy_value_ placeholder:
sess.run(accuracy_summary, feed_dict={accuracy_value_: va})

I implement a naive one-layer model as an example to classify MNIST dataset and visualize validation accuracy in Tensorboard, it works for me.
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
import os
# number of epoch
num_epoch = 1000
model_dir = '/tmp/tf/onelayer_model/accu_info'
# mnist dataset location, change if you need
data_dir = '../data/mnist'
# load MNIST dataset without one hot
dataset = read_data_sets(data_dir, one_hot=False)
# Create placeholder for input images X and labels y
X = tf.placeholder(tf.float32, [None, 784])
# one_hot = False
y = tf.placeholder(tf.int32)
# One layer model graph
W = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[10]))
logits = tf.nn.relu(tf.matmul(X, W) + b)
init = tf.initialize_all_variables()
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y)
# loss function
loss = tf.reduce_mean(cross_entropy)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
_, top_1_op = tf.nn.top_k(logits)
top_1 = tf.reshape(top_1_op, shape=[-1])
correct_classification = tf.cast(tf.equal(top_1, y), tf.float32)
# accuracy function
acc = tf.reduce_mean(correct_classification)
# define info that is used in SummaryWritter
acc_summary = tf.scalar_summary('valid_accuracy', acc)
valid_summary_op = tf.merge_summary([acc_summary])
with tf.Session() as sess:
# initialize all the variable
sess.run(init)
print("Writing Summaries to %s" % model_dir)
train_summary_writer = tf.train.SummaryWriter(model_dir, sess.graph)
# load validation dataset
valid_x = dataset.validation.images
valid_y = dataset.validation.labels
for epoch in xrange(num_epoch):
batch_x, batch_y = dataset.train.next_batch(100)
feed_dict = {X: batch_x, y: batch_y}
_, acc_value, loss_value = sess.run(
[train_op, acc, loss], feed_dict=feed_dict)
vsummary = sess.run(valid_summary_op,
feed_dict={X: valid_x,
y: valid_y})
# Write validation accuracy summary
train_summary_writer.add_summary(vsummary, epoch)

Using batching with your validation set is possible in case you are using tf.metrics ops, which use internal counters. Here is a simplified example:
model = create_model()
tf.summary.scalar('cost', model.cost_op)
acc_value_op, acc_update_op = tf.metrics.accuracy(labels,predictions)
summary_common = tf.summary.merge_all()
summary_valid = tf.summary.merge([
tf.summary.scalar('accuracy', acc_value_op),
# other metrics here...
])
with tf.Session() as sess:
train_writer = tf.summary.FileWriter(logs_path + '/train',
sess.graph)
valid_writer = tf.summary.FileWriter(logs_path + '/valid')
While training, only write the common summary using your train-writer:
summary = sess.run(summary_common)
train_writer.add_summary(summary, tf.train.global_step(sess, gstep_op))
train_writer.flush()
After every validation, write both summaries using the valid-writer:
gstep, summaryc, summaryv = sess.run([gstep_op, summary_common, summary_valid])
valid_writer.add_summary(summaryc, gstep)
valid_writer.add_summary(summaryv, gstep)
valid_writer.flush()
When using tf.metrics, don't forget to reset the internal counters (local variables) before every validation step.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Model evaluation from a checkpoint with Multi GPU - tensorflow

Related

Are there any tools/libraries that can convert tensorflow lstm model to .mlmodel format to run in iOS app

InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool ... [[accuracy/accuracy_var/_3191]]

How to use TensorFlow Dataset API in GANs training?

How to use the trained tensorflow network to run inference?

Add a summary of accuracy of the whole train/test dataset in Tensorflow

Categories

Resources