How to use Test data on saved model with queue approach (without feed_dict) #tensorflow? - tensorflow

I am new to tensorflow. I have build a convonet for mnist image classification as follows I am using queues to read images(png) from the disk batch it and pass it to train op (I am quite comfortable with this now) It's all good till train and I am evaluating my accuracy op at certain number of steps while training.
I am saving the model with Saver object and can see the meta and checkpoint file being written on the disk.
Now the real challenge is to restore the model once it has finished training and use it for predictions on new images
One of the first step in my graph (to train) is like below which takes x_image (images from train queue) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
As I am not using feed dictionary approach, I can not just restore the accuracy op using saver and pass the new data. I have to define the queue for test data and rebuild the graph (exactly as earlier) with reference x_image changed to point to test data Queue.
How can I now restore the learned weights while training and use it to with this new graph to simply run my predict/accuracy op.
I tried to follow
- https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py tutorial but got lost with eval code.
Also if I add a dummy constant in my training graph and then try to retrive it's value, I am able to retrive it.
Can any 1 please help. Thanks

OK, So I have found the answer.
The original challenge was to to toggle between train and test data while training and validation phase when using queues.
Now as queues are part of graph structure, we can't simply modify them.
I found an article to use tf.case to toggle between train and test queue but I wasn't able to use shuffle batch along with it.
The real task at hand was to save the model post training and use the saved model to predict in production.
So here is the flow:
Training
create a method that creates your graph (will take image tensor as
input).
Build a training graph by passing training image batches
Perform training and save the model with saver object.
Evaluation
Now reconstruct the same graph with test image batches.
In the session use saver object to restore the weights (Note you dont need to pass which variables to restore, by default it restores all restore able variables)
Dont run gloabl variable initializer at this time
Run your predict op (generated from the newly constructed graph)
Also make sure you switch off the drop out functionality in the eval as it would keep varying the output for the same input
Below is the pseudocode
train_op, y_predict, accuracy = create_graph(train_input, train_label)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
model_saver = tf.train.Saver()
for i in range(2000):
if i%100 == 0:
train_accuracy = sess.run(accuracy)
print("step %d, training accuracy %f" %(i, train_accuracy))
sess.run(train_op)
print(sess.run(accuracy))
model_saver.save(sess, 'model/simple_model', global_step=100)
coord.request_stop()
coord.join(threads)
For evaluation
_, y_predict, accuracy = create_graph(test_input, test_label)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint("./model/"))
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
label_predict = sess.run([y_predict])

Related

tensorflow restores different values for weights each time (from the same file!)

So I'm training a model on a machine with GPU. Of course I save it in the end of the training:
a = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
saver = tf.train.Saver(a)
saver.save(sess, save_path)
Now I have one file, but every time I restore the model from the same file I get different numbers in the matrices, and different predictions for the same examples.
I restore the model like this:
saver = tf.train.import_meta_graph('{}.meta'.format(save_path))
sess.run(tf.global_variables_initializer())
saver.restore(sess, save_path)
What is happening here?
When you call sess.run(tf.global_variables_initializer()) after importing the frozen graph, you probably reinitialise some variables that you should not.
Instead, you should initialise only the uninitialised variables. One way to do it would be (credit to this answer)
uninitialized_vars = []
for var in tf.all_variables():
try:
sess.run(var)
except tf.errors.FailedPreconditionError:
uninitialized_vars.append(var)
init_new_vars_op = tf.initialize_variables(uninitialized_vars)

Restoring a model with tensorflow does not lead to same prediction

my question revolves around Tensorflow and its tf.train.Saver() function to save the state of a network and restore afterwards. My code is very long and messy but it is structured as:
Generate a set of parameter settings for the network
Build the network
Either train the network or restore it based on saved model file
Predict on some external data
Now to my question. When I run my model with the exact same parameters several times I get the exact same performance. If I restore the same model several times I also get the same performance. However, the performance when I train the model and when I restore model are not the same, even if the restored model comes directly from the trained model. I have tried to save either the whole model (so tf.train.Saver() which creates huge files) or just trainable variables (so tf.train.Saver(tf.trainable_variables()) which makes much smaller files) and both give the same result, but it is still not identical to directly training the model. Keep in mind that the differences are generally very small, and for some of the individual tasks in my network they are identical, but the difference bugs me.
I have seen several questions about Tensorflow model saving and restoring but none seem to solve my question. As far as I can see the model restores correctly (I check that the weights were the same and they seem to be) but it doesn't give the exact same results. I think I can rule out random events in my code because while training and restoring I can reproduce the same results given the same parameters. I am not sure how to proceed.
Does any one know why I have this problem?
I add just small snippets of the code below. First how I restore the model:
def load_network_state(model_file):
print('Restoring model')
# saver = tf.train.Saver(tf.trainable_variables())
saver = tf.train.Saver()
sess=tf.Session()
saver.restore(sess, model_file)
return sess
In the training function:
# saver = tf.train.Saver(tf.trainable_variables())
saver = tf.train.Saver()
if model_file is not None:
save_path=saver.save(sess, model_file+'_{0}'.format(model_suffix))
In the main loop:
if args['restore_network']:
params = create_parameters_from_file(args['arg_file'], args, j)
else:
params=create_random_parameter(args)
start=time.time()
train_data, test_data = transform(train_data, test_data, params)
kwargs=generate_kwargs_dictionary(generate_network, params)
features, targets, predictions, train_op = generate_network(train_data, test_data, model_suffix=j, **kwargs)
if args['restore_network']:
session = load_network_state(params['model_file']+'_{0}'.format(j), params['seed'])
else:
kwargs=generate_kwargs_dictionary(network_training, params)
session = network_training(train_data, features, targets, train_op, model_suffix=j, **kwargs)
kwargs=generate_kwargs_dictionary(network_prediction, params)
training_results = network_prediction(train_data, features, targets, predictions, session, model_suffix=j, **kwargs)
test_results = network_prediction(test_data, features, targets, predictions, session, model_suffix=j, **kwargs)
elapsed=time.time()-start
Thank you for any help you could provide

Tensorflow supervisor for both training and evaluating operations?

I've been using tensorflow supervisor (https://www.tensorflow.org/programmers_guide/supervisor) for loading the model from the saved checkpoints for both training and running a network. But I noticed that the checkpoint files get updated even when running/evaluating the model (timestamps of graph.pbtxt, model.ckpt.data files updated and new events.out created).
This makes me wonder if using a supervisor for running/evaluating the model resets/alters the trained state as well? Is it advisable to use supervisor for anything other than training?
Train -
sv = tf.train.Supervisor(logdir=mylogdir)
with sv.managed_session() as sess:
if not sv.should_stop():
train_step.run(feed_dict={x: xtrain, y_: ytrain, keep_prob: 0.5}, session= sess)
Run/Evaluate only. We don't want the below to modify the trained state of the model
sv = tf.train.Supervisor(logdir=mylogdir)
with sv.managed_session() as sess:
for yconv in sess.run(y_conv, feed_dict={x: xtest, keep_prob: 1.0}):
#use yconv to predict, evaluate etc.
Your model is usually saved to a file names 'model.ckpt-NUM'. As long as the evaluation does not update that file (and it shouldn't), then you're safe.
If you are worried about overwriting logging/summaries. You should be careful in choosing the summaries names.
E.g. for evaluation, choose a summary name 'eval/' +metric_name, and for training 'train/' +metric_name, for example, see here and here.
EDIT:
You can also choose a different directory ("logdir") for storing the evaluations results, as in the of API shown here

Saving the state of the AdaGrad algorithm in Tensorflow

I am trying to train a word2vec model, and want to use the embeddings for another application. As there might be extra data later, and my computer is slow when training, I would like my script to stop and resume training later.
To do this, I created a saver:
saver = tf.train.Saver({"embeddings": embeddings,"embeddings_softmax_weights":softmax_weights,"embeddings_softmax_biases":softmax_biases})
I save the embeddings, and softmax weights and biases so I can resume training later. (I assume that this is the correct way, but please correct me if I'm wrong).
Unfortunately when resuming training with this script the average loss seems to go up again.
My idea is that this can be attributed to the AdaGradOptimizer I'm using. Initially the outer product matrix will probably be set to all zero's, where after my training it will be filled (leading to a lower learning rate).
Is there a way to save the optimizer state to resume learning later?
While TensorFlow seems to complain when you attempt to serialize an optimizer object directly (e.g. via tf.add_to_collection("optimizers", optimizer) and a subsequent call to tf.train.Saver().save()), you can save and restore the training update operation which is derived from the optimizer:
# init
if not load_model:
optimizer = tf.train.AdamOptimizer(1e-4)
train_step = optimizer.minimize(loss)
tf.add_to_collection("train_step", train_step)
else:
saver = tf.train.import_meta_graph(modelfile+ '.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
train_step = tf.get_collection("train_step")[0]
# training loop
while training:
if iteration % save_interval == 0:
saver = tf.train.Saver()
save_path = saver.save(sess, filepath)
I do not know of a way to get or set the parameters specific to an existing optimizer, so I do not have a direct way of verifying that the optimizer's internal state was restored, but training resumes with loss and accuracy comparable to when the snapshot was created.
I would also recommend using the parameterless call to Saver() so that state variables not specifically mentioned will still be saved, although this might not be strictly necessary.
You may also wish to save the iteration or epoch number for later restoring, as detailed in this example:
http://www.seaandsailor.com/tensorflow-checkpointing.html

Tensorflow: Saving and restoring the model parameters

I am a beginner in TensorFlow, currently training a CNN.
I am using Saver in order to save the parameters used by the model, but I am having concerns whether this would itself store all the Variables used by the model, and is sufficient to restore the values to re-run the program for performing classification/testing on the trained network.
Let us look at the famous example MNIST given by TensorFlow.
In the example, we have bunch of Convolutional blocks, all of which have weight, and bias variables that gets initialised when the program is run.
W_conv1 = init_weight([5,5,1,32])
b_conv1 = init_bias([32])
After having processed several layers, we create a session, and initialise all the variables added to the graph.
sess = tf.Session()
sess.run(tf.initialize_all_variables())
saver = tf.train.Saver()
Here, is it possible to comment the saver.save code, and replace it by saver.restore(sess,file_path) after the training, in order to restore the weight, bias, etc., parameters back to the graph? Is this how it should be ?
for i in range(1000):
...
if i%500 == 0:
saver.save(sess,"model%d.cpkt"%(i))
I am currently training on large dataset, so terminating, and restarting the training is a waste of time, and resources so I request someone to please clarify before the I start the training.
If you want to save the final result only once, you can do this:
with tf.Session() as sess:
for i in range(1000):
...
path = saver.save(sess, "model.ckpt") # out of the loop
print "Saved:", path
In other programs, you can load the model using the path returned from saver.save for prediction or something. You can see some examples at https://github.com/sugyan/tensorflow-mnist.
Based on the explanation in here and Sung Kim solution I wrote a very simple model exactly for this problem. Basically in this way you need to create an object from the same class and restore its variables from the saver. You can find an example of this solution here.