In my training file(train.py), I write:
def deep_part(self):
with tf.variable_scope("deep-part"):
y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.factor_size]) # None * (F*K)
# self.deep_layers = 2
for i in range(0,len(self.deep_layers)):
y_deep = tf.contrib.layers.fully_connected(y_deep, self.deep_layers[i], \
activation_fn=self.deep_layers_activation, scope = 'fc%d' % i)
return y_deep
now in predict file(predict.py), I restore the checkpoint, but I dont know how to reload the "deep-part" network's weights and biases.Because I think the "fully_conncted" function might hide the weights and biases.
I wrote a lengthy explanation here. A short summary:
By saver.save(sess, '/tmp/my_model') Tensorflow produces multiple files:
checkpoint
my_model.data-00000-of-00001
my_model.index
my_model.meta
The checkpoint file checkpoint is just a pointer to the latest version of our model-weights and it is simply a plain text file containing
$ !cat /tmp/model/checkpoint
model_checkpoint_path: "/tmp/my_model"
all_model_checkpoint_paths: "/tmp/my_model"
The others are binary files containing the graph (.meta) and weights (.data*).
You can help yourself by running
import tensorflow as tf
import numpy as np
data = np.arange(9 * 1).reshape(1, 9).astype(np.float32)
plhdr = tf.placeholder(tf.float32, shape=[1, 9], name='input')
print plhdr.name
activation = tf.layers.dense(plhdr, 10, name='fc')
print activation.name
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
expected = sess.run(activation, {plhdr: data})
print expected
saver = tf.train.Saver(tf.global_variables())
saver.save(sess, '/tmp/my_model')
tf.reset_default_graph()
with tf.Session() as sess:
# load the computation graph (the fully connected + placeholder)
loader = tf.train.import_meta_graph('/tmp/my_model.meta')
sess.run(tf.global_variables_initializer())
plhdr = tf.get_default_graph().get_tensor_by_name('input:0')
activation = tf.get_default_graph().get_tensor_by_name('fc/BiasAdd:0')
actual = sess.run(activation, {plhdr: data})
assert np.allclose(actual, expected) is False
# now load the weights
loader = loader.restore(sess, '/tmp/my_model')
actual = sess.run(activation, {plhdr: data})
assert np.allclose(actual, expected) is True
Related
Using TensorFlow 1.9, I want to train a neural network in one Python file, and then restore the network using a different Python file. I have tried to do this using a simple example, but when I try to load my "prediction" operation, I receive an error. Specifically, the error is: KeyError: "The name 'prediction' refers to an Operation not in the graph.".
Below is my Python file to train and save the network. It generates some example data and trains a simple neural network, then saves the network every epoch.
import numpy as np
import tensorflow as tf
input_data = np.zeros([100, 10])
label_data = np.zeros([100, 1])
for i in range(100):
for j in range(10):
input_data[i, j] = i * j / 1000
label_data[i] = 2 * input_data[i, 0] + np.random.uniform(0.01)
input_placeholder = tf.placeholder(tf.float32, shape=[None, 10], name='input_placeholder')
label_placeholder = tf.placeholder(tf.float32, shape=[None, 1], name='label_placeholder')
x = tf.layers.dense(inputs=input_placeholder, units=10, activation=tf.nn.relu)
x = tf.layers.dense(inputs=x, units=10, activation=tf.nn.relu)
prediction = tf.layers.dense(inputs=x, units=1, name='prediction')
loss_op = tf.reduce_mean(tf.square(prediction - label_placeholder))
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss_op)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch_num in range(100):
_, loss = sess.run([train_op, loss_op], feed_dict={input_placeholder: input_data, label_placeholder: label_data})
print('epoch ' + str(epoch_num) + ', loss = ' + str(loss))
saver.save(sess, '../Models/model', global_step=epoch_num + 1)
And below is my Python file to restore the network. It loads the input and output placeholders, together with the operation required for making predictions. However, even though I have named an operation as prediction in the training code above, the code below cannot seem to find this operation in the loaded graph.
import tensorflow as tf
import numpy as np
input_data = np.zeros([100, 10])
for i in range(100):
for j in range(10):
input_data[i, j] = i * j / 1000
with tf.Session() as sess:
saver = tf.train.import_meta_graph('../Models/model-99.meta')
saver.restore(sess, '../Models/model-99')
graph = tf.get_default_graph()
input_placeholder = graph.get_tensor_by_name('input_placeholder:0')
label_placeholder = graph.get_tensor_by_name('label_placeholder:0')
prediction = graph.get_operation_by_name('prediction')
pred = sess.run([prediction], feed_dict={input_placeholder: input_data})
Why can this code not find this operation, and what should I do to correct my code?
You have to modify a single line in your loading script (tested with tf 1.8):
prediction = graph.get_tensor_by_name('prediction/BiasAdd:0')
You have to specify which tensor you want to access, as prediction is only the namespace for the dense layer. You can check the exact name during saving with prediction.name. And when restoring, use tf.get_tensor_by_name as you are interested in the value, not the operation producing it.
When freezing a graph and then running it elsewhere (mobile device), the output is of low quality compared to the inference on the server on my semantic segmentation model. It is basically a messy version of what would run on the server. It is executing successfully, but it appears as though something was not initialized prior to freezing, even though the method to load the model between the export script and inference scripts is nearly identical.
The exported model can be run on the same images over and over and produce the same results for a given set of images, as expected.
However, each time the model is frozen, using the exact same script and checkpoint, it creates a different output for a given set of images.
def main():
args = get_arguments()
if args.dataset == 'cityscapes':
num_classes = cityscapes_class
else:
num_classes = ADE20k_class
shape = [320, 320]
x = tf.placeholder(dtype=tf.float32, shape=(shape[0], shape[1], 3), name="input")
img_tf = preprocess(x)
model = model_config[args.model]
net = model({'data': img_tf}, num_classes=num_classes, filter_scale=args.filter_scale)
raw_output = net.layers['conv6_cls']
raw_output_up = tf.image.resize_bilinear(raw_output, size=shape, align_corners=True)
raw_output_maxed = tf.argmax(raw_output_up, axis=3, name="output")
# Init tf Session
config = tf.ConfigProto()
sess = tf.Session(config=config)
init = tf.global_variables_initializer()
sess.run(init)
model_path = model_paths[args.model]
ckpt = tf.train.get_checkpoint_state(model_path)
if ckpt and ckpt.model_checkpoint_path:
input_checkpoint = ckpt.model_checkpoint_path
loader = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=True)
load(loader, sess, ckpt.model_checkpoint_path)
else:
print('No checkpoint file found at %s.' % model_path)
exit()
print("Loaded Model")
# We retrieve the protobuf graph definition
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()
# We use a built-in TF helper to export variables to constants
output_graph_def = graph_util.convert_variables_to_constants(
sess, # The session is used to retrieve the weights
input_graph_def, # The graph_def is used to retrieve the nodes
output_node_names.split(",") # The output node names are used to select the usefull nodes
)
# Finally we serialize and dump the output graph to the filesystem
with tf.gfile.GFile("model/output_graph.pb", "wb") as f:
f.write(output_graph_def.SerializeToString())
print("%d ops in the final graph." % len(output_graph_def.node))
I have read many similar questions and just cannot get this to work properly.
I have my model being trained well and checkpoint files are being made every epoch. I want to have it so the program can continue from epoch x once reloaded and also for it to print that is on that epoch with every iteration. I could simply save the data outside of the checkpoint file, however I was also wanting to do this to give me confidence everything else is also being stored properly.
Unfortunately the value in the epoch/global_step variable is always still 0 when I restart.
import tensorflow as tf
import numpy as np
import tensorflow as tf
import numpy as np
# more imports
def extract_number(f): # used to get latest checkpint file
s = re.findall("epoch(\d+).ckpt",f)
return (int(s[0]) if s else -1,f)
def restore(init_op, sess, saver): # called to restore or just initialise model
list = glob(os.path.join("./params/e*"))
if list:
file = max(list,key=extract_number)
saver.restore(sess, file[:-5])
sess.run(init_op)
return
with tf.Graph().as_default() as g:
# build models
total_batch = data.train.num_examples / batch_size
epochLimit = 51
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
restore(init_op, sess, saver)
epoch = global_step.eval()
while epoch < epochLimit:
total_batch = data.train.num_examples / batch_size
for i in range(int(total_batch)):
sys.stdout.flush()
voxels = newData.eval()
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
sess.run(opt_G, feed_dict={z:batch_z, train:True})
sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True})
with open("out/loss.csv", 'a') as f:
batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False})
batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False})
msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D)
print(msgOut)
epoch=epoch+1
sess.run(global_step.assign(epoch))
saver.save(sess, "params/epoch{0}.ckpt".format(epoch))
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
voxels = sess.run(x_, feed_dict={z:batch_z})
v = voxels[0].reshape([32, 32, 32]) > 0
util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)
I also update the global step variable using assign at the bottom. Any ideas? Any help would be greatly appreciated.
When you call sess.run(init_op) after restoring this resets all variables to their initial values. Comment that line out and things should work.
My original code was wrong for several reasons because I was trying so many things. The first responder Alexandre Passos gives a valid point, but I believe what changed the game was also the use of scopes (maybe?).
Below is the working updated code if it helps anyone:
import tensorflow as tf
import numpy as np
# more imports
def extract_number(f): # used to get latest checkpint file
s = re.findall("epoch(\d+).ckpt",f)
return (int(s[0]) if s else -1,f)
def restore(sess, saver): # called to restore or just initialise model
list = glob(os.path.join("./params/e*"))
if list:
file = max(list,key=extract_number)
saver.restore(sess, file[:-5])
return saver, True, sess
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
sess.run(init_op)
return saver, False , sess
batch_size = 100
learning_rate = 0.0001
beta1 = 0.5
z_size = 100
save_interval = 1
data = dataset.read()
total_batch = data.train.num_examples / batch_size
def fill_queue():
for i in range(int(total_batch*epochLimit)):
sess.run(enqueue_op, feed_dict={batch: data.train.next_batch(batch_size)}) # runnig in seperate thread to feed a FIFOqueue
with tf.variable_scope("glob"):
global_step = tf.get_variable(name='global_step', initializer=0,trainable=False)
# build models
epochLimit = 51
saver = tf.train.Saver()
with tf.Session() as sess:
saver,rstr,sess = restore(sess, saver)
with tf.variable_scope("glob", reuse=True):
epocht = tf.get_variable(name='global_step', trainable=False, dtype=tf.int32)
epoch = epocht.eval()
while epoch < epochLimit:
total_batch = data.train.num_examples / batch_size
for i in range(int(total_batch)):
sys.stdout.flush()
voxels = newData.eval()
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
sess.run(opt_G, feed_dict={z:batch_z, train:True})
sess.run(opt_D, feed_dict={input:voxels, z:batch_z, train:True})
with open("out/loss.csv", 'a') as f:
batch_loss_G = sess.run(loss_G, feed_dict={z:batch_z, train:False})
batch_loss_D = sess.run(loss_D, feed_dict={input:voxels, z:batch_z, train:False})
msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D)
print(msgOut)
epoch=epoch+1
sess.run(global_step.assign(epoch))
saver.save(sess, "params/epoch{0}.ckpt".format(epoch))
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
voxels = sess.run(x_, feed_dict={z:batch_z})
v = voxels[0].reshape([32, 32, 32]) > 0
util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)
I am running Tensorflow 0.12.1 on a GPU. I have a trained Deep CNN model whose weights I've saved using a checkpoint file. During inference, I reload the saved checkpoint using restorer.restore(sess, tf.train.latest_checkpoint(FLAGS.train_dir)). The code seems to run without issues, but everytime I re-run the script, I'm getting screwed up outputs. AFAIK, I do not shuffle my test set inputs. The inputs are being loaded and fed to the network properly. It is just the output of different runs of the CNN on the same test set using the same order is producing very different outputs. I'm perplexed! Also, how do I execute a graph loaded with saved checkpoint without running an init_op during inference? It seems my code requires all global and local variables to be initialized before execution. (I initialize first, and then only restore the checkpoint!).Here's a snippet of my code:
import tensorflow as tf
import numpy as np
import os
import os.path
from datetime import datetime
import time
import random
import json
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes
from modelFCNN3 import model
def read_input(inp_queue,height=224,width=224,channels=3, mask=False):
value = tf.read_file(inp_queue)
image = tf.image.decode_png(value)
image = tf.image.resize_images(image, [height, width],method=2)
image = tf.cast(image, tf.uint8)
image.set_shape([height,width,channels])
image = tf.reshape(image,[height,width,channels])
if mask:
image = tf.to_float(tf.greater_equal(image,128))
image = tf.cast(image,tf.float32)
else:
image = tf.image.per_image_standardization(image)
image = tf.cast(image,tf.float32)
return image
if __name__ == '__main__':
tf.reset_default_graph()
with open('X_test.json', 'r') as infile:
X_test = json.load(infile)
with open('y_test.json', 'r') as infile:
y_test = json.load(infile)
imagelist = ops.convert_to_tensor(X_test, dtype=dtypes.string)
labellist = ops.convert_to_tensor(y_test, dtype=dtypes.string)
input_queue = tf.train.slice_input_producer([imagelist, labellist],
num_epochs=1,
shuffle=False)
image = read_input(input_queue[0],height=224,width=224,channels=3, mask=False)
label = read_input(input_queue[1],height=224,width=224,channels=1, mask=True)
images_batch, labels_batch = tf.train.batch([image, label], batch_size=FLAGS.batch_size,
enqueue_many=False,shapes=None, allow_smaller_final_batch=True)
global_step = tf.Variable(0, trainable=False)
images = tf.placeholder_with_default(images_batch, shape=[None, 224,224,3])
labels = tf.placeholder_with_default(labels_batch, shape=[None, 224,224,1])
restorer = tf.train.Saver()
logits = model(images).logits
labels = tf.cast(labels,tf.int32)
labels.set_shape([FLAGS.batch_size,224,224,1])
valid_prediction = tf.argmax(tf.nn.softmax(logits), dimension=3)
valid_prediction.set_shape([FLAGS.batch_size,224,224])
meanIOU,update_op_mIOU= tf.contrib.metrics.streaming_mean_iou(tf.cast(valid_prediction,tf.int32), tf.squeeze(labels),FLAGS.num_classes)
init = tf.global_variables_initializer()
init_locals = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_locals])
restorer.restore(sess, tf.train.latest_checkpoint(FLAGS.train_dir))
print("Model restored.")
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord,sess=sess)
summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)
try:
step = 0
avg = []
while not coord.should_stop():
myimg, predimg, mylbl= sess.run([images,valid_prediction,labels])
mIOU,_ = sess.run([meanIOU,update_op_mIOU])
avg.append(mIOU)
step += 1
except tf.errors.OutOfRangeError:
print('Done training -- epoch limit reached')
finally:
coord.request_stop()
coord.join(threads)
sess.close()
Are you running on same machine or different machine
#saver = tf.train.Saver()
The following comment is in tensorflow docs
#NOTE: Restarting training from saved meta_graph only works if the device assignments have not changed.
#saver = tf.train.import_meta_graph(metafile)
Tensorboard provides embedding visualization of tensorflow variables by using tf.train.Saver(). The following is a working example (from this answer)
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.tensorboard.plugins import projector
LOG_DIR = '/tmp/emb_logs/'
metadata = os.path.join(LOG_DIR, 'metadata.tsv')
mnist = input_data.read_data_sets('MNIST_data')
#Variables
images = tf.Variable(mnist.test.images, name='images')
weights=tf.Variable(tf.random_normal([3,3,1,16]))
biases=tf.Variable(tf.zeros([16]))
#Tensor from the variables
x = tf.reshape(images, shape=[-1, 28, 28, 1])
conv_layer=tf.nn.conv2d(x, weights, [1,1,1,1], padding="SAME")
conv_layer=tf.add(conv_layer, biases)
y = tf.reshape(conv_layer, shape=[-1, 28*28*16])
with open(metadata, 'wb') as metadata_file:
for row in mnist.test.labels:
metadata_file.write('%d
' % row)
with tf.Session() as sess:
saver = tf.train.Saver([images])
sess.run(images.initializer)
saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))
config = projector.ProjectorConfig()
# One can add multiple embeddings.
embedding = config.embeddings.add()
embedding.tensor_name = images.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = metadata
# Saves a config file that TensorBoard will read during startup.
projector.visualize_embeddings(tf.summary.FileWriter(LOG_DIR), config)
How can I visualize embeddings from a tensorflow tensor, like y in the code above?
Simply replacing
saver = tf.train.Saver([images])
with
saver = tf.train.Saver([y])
doesn't work, because of the following error:
474 var = ops.convert_to_tensor(var, as_ref=True)
475 if not BaseSaverBuilder._IsVariable(var):
476 raise TypeError("Variable to save is not a Variable: %s" % var)
477 name = var.op.name
478 if name in names_to_saveables:
TypeError: Variable to save is not a Variable: Tensor("Reshape_11:0", shape=(10000, 12544), dtype=float32)
Does anyone know of an alternative to generate tensorboard embedding visualizations of a tf.tensor?
You could create a new variable you can assign the value of y to.
y_var = tf.Variable(tf.shape(y))
saver = tf.train.Saver([y_var])
assign_op = y_var.assign(y)
...
# Training
sess.run((loss, assign_op))
saver.save(...)
tf.summary.tensor_summary will save a summary for an arbitrary Tensor.