Tensorflow: Saving and restoring the model parameters - tensorflow

I am a beginner in TensorFlow, currently training a CNN.
I am using Saver in order to save the parameters used by the model, but I am having concerns whether this would itself store all the Variables used by the model, and is sufficient to restore the values to re-run the program for performing classification/testing on the trained network.
Let us look at the famous example MNIST given by TensorFlow.
In the example, we have bunch of Convolutional blocks, all of which have weight, and bias variables that gets initialised when the program is run.
W_conv1 = init_weight([5,5,1,32])
b_conv1 = init_bias([32])
After having processed several layers, we create a session, and initialise all the variables added to the graph.
sess = tf.Session()
sess.run(tf.initialize_all_variables())
saver = tf.train.Saver()
Here, is it possible to comment the saver.save code, and replace it by saver.restore(sess,file_path) after the training, in order to restore the weight, bias, etc., parameters back to the graph? Is this how it should be ?
for i in range(1000):
...
if i%500 == 0:
saver.save(sess,"model%d.cpkt"%(i))
I am currently training on large dataset, so terminating, and restarting the training is a waste of time, and resources so I request someone to please clarify before the I start the training.

If you want to save the final result only once, you can do this:
with tf.Session() as sess:
for i in range(1000):
...
path = saver.save(sess, "model.ckpt") # out of the loop
print "Saved:", path
In other programs, you can load the model using the path returned from saver.save for prediction or something. You can see some examples at https://github.com/sugyan/tensorflow-mnist.

Based on the explanation in here and Sung Kim solution I wrote a very simple model exactly for this problem. Basically in this way you need to create an object from the same class and restore its variables from the saver. You can find an example of this solution here.

Related

Tensorflow use trained model to make predictions initialization error

I've trained a model on some data (just a simple classification task). After, I wish to use this same model to run some predictions via a separate function make_prediction().
So currently my main file is simply something like :
agent.train(data)
agent.make_predictions(new_data)
and tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
I don't initialize the variables in my second function so that session is different to the previous but it is surprising to me that I can't simply reopen a previous session. Do I need to checkpoint the model after training and then reload it each time?
Thanks a lot

How I reuse trained model in DNN?

Everyone!
I have a question releate in trained model reusing( tensorflow ).
I have train model
I want predict new data used trained model.
I use DNNClassifier.
I have a model.ckpt-200000.meta, model.ckpt-200000.index, checkpoint, and eval folder.
but I don't know reuse this model..
plz help me.
First, you need to import your graph,
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph('model.ckpt-200000.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
Then you can give input to the graph and get the output.
graph = tf.get_default_graph()
input = graph.get_tensor_by_name("input:0")#input tensor by name
feed_dict ={input:} #input to the model
#Now, access theoutput operation.
op_to_restore = graph.get_tensor_by_name("y_:0") #output tensor
print sess.run(op_to_restore,feed_dict) #get output here
Few things to note,
You can replace the above code with your training part of the graph
(i.e you can get the output without training).
However, you still have to construct your graph as previously and
only replace the training part.
Above method only loading the weights for the constructed graph. Therefore, you have to construct the graph first.
A good tutorial on this can be found here, http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/
If you don't want to construct the graph again you can follow this tutorial, https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc

Tensorflow save/restore batch norm

I trained a model with batch norm in Tensorflow. I would like to save the model and restore it for further using. The batch norm is done by
def batch_norm(input, phase):
return tf.layers.batch_normalization(input, training=phase)
where the phase is True during training and False during testing.
It seems like simply calling
saver = tf.train.Saver()
saver.save(sess, savedir + "ckpt")
would not work well because when I restore the model it first says restored successfully. It also says Attempting to use uninitialized value batch_normalization_585/beta if I just run one node in the graph. Is this related to not saving the model properly or something else that I've missed?
I also had the "Attempting to use uninitialized value batch_normalization_585/beta" error. This comes from the fact that by declaring the saver with the empty brackets like this:
saver = tf.train.Saver()
The saver will save the variables contained in tf.trainable_variables() which do not contain the moving average of the batch normalization. To include this variables into the saved ckpt you need to do:
saver = tf.train.Saver(tf.global_variables())
Which saves ALL the variables, so it is very memory consuming. Or you must identify the variables that have moving avg or variance and save them by declaring them like:
saver = tf.train.Saver(tf.trainable_variables() + list_of_extra_variables)
Not sure if this needs to be explained, but just in case (and for other potential viewers).
Whenever you create an operation in TensorFlow, a new node is added to the graph. No two nodes in a graph can have the same name. You can define the name of any node you create, but if you don't give a name, TensorFlow will pick one for you in a deterministic way (that is, not randomly, but instead always with the same sequence). If you add two numbers, it will probably be Add, but if you do another addition, since no two nodes can have the same name, it may be something like Add_2. Once a node is created in a graph its name cannot be changed. Many functions create several subnodes in turn; for example, tf.layers.batch_normalization creates some internal variables beta and gamma.
Saving and restoring works in the following way:
You create a graph representing the model that you want. This graph contains the variables that will be saved by the saver.
You initialize, train or do whatever you want with that graph, and the variables in the model get assigned some values.
You call save on the saver to, well, save the values of the variables to a file.
Now you recreate the model in a different graph (it can be a different Python session altogether or just another graph coexisting with the first one). The model must be created in exactly the same way the first one was.
You call restore on the saver to retrieve the values of the variables.
In order for this to work, the names of the variables in the first and the second graph must be exactly the same.
In your example, TensorFlow is complaining about the variable batch_normalization_585/beta. It seems that you have called tf.layers.batch_normalization nearly 600 times in the same graph, so you have that many beta variables hanging around. I doubt that you actually need that many, so I guess you are just experimenting with the API and ended up with that many copies.
Here's a draft of something that should work:
import tensorflow as tf
def make_model():
input = tf.placeholder(...)
phase = tf.placeholder(...)
input_norm = tf.layers.batch_normalization(input, training=phase))
# Do some operations with input_norm
output = ...
saver = tf.train.Saver()
return input, output, phase, saver
# We work with one graph first
g1 = tf.Graph()
with g1.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
# Do your training or whatever...
saver.save(sess, savedir + "ckpt")
# We work with a second different graph now
g2 = tf.Graph()
with g2.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
saver.restore(sess, savedir + "ckpt")
# Continue using your model...
Again, the typical case is not to have two graphs side by side, but rather have one graph and then recreate it in another Python session later, but in the end both things are the same. The important part is that the model is created in the same way (and therefore with the same node names) in both cases.

How to use Test data on saved model with queue approach (without feed_dict) #tensorflow?

I am new to tensorflow. I have build a convonet for mnist image classification as follows I am using queues to read images(png) from the disk batch it and pass it to train op (I am quite comfortable with this now) It's all good till train and I am evaluating my accuracy op at certain number of steps while training.
I am saving the model with Saver object and can see the meta and checkpoint file being written on the disk.
Now the real challenge is to restore the model once it has finished training and use it for predictions on new images
One of the first step in my graph (to train) is like below which takes x_image (images from train queue) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
As I am not using feed dictionary approach, I can not just restore the accuracy op using saver and pass the new data. I have to define the queue for test data and rebuild the graph (exactly as earlier) with reference x_image changed to point to test data Queue.
How can I now restore the learned weights while training and use it to with this new graph to simply run my predict/accuracy op.
I tried to follow
- https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py tutorial but got lost with eval code.
Also if I add a dummy constant in my training graph and then try to retrive it's value, I am able to retrive it.
Can any 1 please help. Thanks
OK, So I have found the answer.
The original challenge was to to toggle between train and test data while training and validation phase when using queues.
Now as queues are part of graph structure, we can't simply modify them.
I found an article to use tf.case to toggle between train and test queue but I wasn't able to use shuffle batch along with it.
The real task at hand was to save the model post training and use the saved model to predict in production.
So here is the flow:
Training
create a method that creates your graph (will take image tensor as
input).
Build a training graph by passing training image batches
Perform training and save the model with saver object.
Evaluation
Now reconstruct the same graph with test image batches.
In the session use saver object to restore the weights (Note you dont need to pass which variables to restore, by default it restores all restore able variables)
Dont run gloabl variable initializer at this time
Run your predict op (generated from the newly constructed graph)
Also make sure you switch off the drop out functionality in the eval as it would keep varying the output for the same input
Below is the pseudocode
train_op, y_predict, accuracy = create_graph(train_input, train_label)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
model_saver = tf.train.Saver()
for i in range(2000):
if i%100 == 0:
train_accuracy = sess.run(accuracy)
print("step %d, training accuracy %f" %(i, train_accuracy))
sess.run(train_op)
print(sess.run(accuracy))
model_saver.save(sess, 'model/simple_model', global_step=100)
coord.request_stop()
coord.join(threads)
For evaluation
_, y_predict, accuracy = create_graph(test_input, test_label)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint("./model/"))
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
label_predict = sess.run([y_predict])

Can I retrain an old model with new data using TensorFlow?

I am new to TensorFlow and I am just trying to see if my idea is even possible.
I have trained a model with multi class classifier. Now I can classify a sentence in input, but I would like to change the result of CNN, for example, to improve the score of classification or change the classification.
I want to try to train just a single sentence with its class on a trained model, is this possible?
If I understand your question correctly, you are trying to reload a previously trained model either to run it through further iterations, test it on a new sentence, or fine tune the model a bit. If this is the case, yes you can do this. Look into saving and restoring models (https://www.tensorflow.org/api_guides/python/state_ops#Saving_and_Restoring_Variables).
To give you a rough outline, when you initially train your model, after setting up the network architecture, set up a saver:
trainable_var = tf.trainable_variables()
sess = tf.Session()
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer
# Run/train your model until some completion criteria is reached
#....
#....
saver.save(sess, 'model.ckpt')
Now, to reload your model:
saver = tf.train.import_meta_graph('model.ckpt.meta')
saver.restore('model.ckpt')
#Note: if you have already defined all variables before restoring the model, import_meta_graph is not necessary
This will give you access to all the trained variables and you can now feed in whatever new sentence you have. Hope this helps.