In TensorFlow, Is there any difference in saving and restore the LSTM models? - tensorflow

I have tried to save and restore the LSTM models according to the tutorial. Actually, it works in the normal models saving and restoring like CNN models. However when i tried to restore the LSTM models it throw the error
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value RNN_model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
the key codes shows follow:
with tf.Session() as sess:
saver.restore(sess, saver_path)
with tf.variable_scope('RNN_model', reuse=None):
train_rnn = RNNmodel.LSTMmodel(True, RNNmodel.TRAIN_BATCH_SIZE, RNNmodel.NUM_STEP)
with tf.variable_scope('RNN_model', reuse=True):
test_rnn = RNNmodel.LSTMmodel(False, RNNmodel.EVAL_BATCH_SIZE, RNNmodel.NUM_STEP)
I wonder if there any difference between the normal models and LSTM models in the saving and restoring. Please help
EDIT:
i try to move the restore and it works, but when i run my models, it still throws the same error, my run_epoch code like:
def run_epoch(session, model, datas, train_op, is_log, epoch=3000):
state = session.run(model.initiate_state)
total_cost = 0
for i in range(epoch):
data, label = random_get_data(datas, model.batch_size, num_step=RNNmodel.NUM_STEP)
feed_dict = {
model.input_data: data,
model.target: label,
model.initiate_state: state
}
cost, state, argmax_logit, target, _ = session.run([model.loss, model.final_state, model.argmax_target, model.target, train_op], feed_dict)
the log located the error at the :
cost, state, argmax_logit, target, _ = session.run([model.loss, model.final_state, model.argmax_target, model.target, train_op], feed_dict)
and the log shows follow:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value RNN_model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
it seems that the restore does not restore the lstm kernel operation.should i do anything to initiate the lstm operation specially?
EDIT2:
I check out the checkpoint file at last and i am sure that save operation dose not save the variables about LSTM cells and i dont know why.It seems that i have to name the variables explicitly otherwise i cannot save it, and BasicLSTMCell class init() dose not have the name parameters.

There is no difference, RNNs use ordinary variables.
I think you have to move
saver.restore(sess, saver_path)
after you create the LSTMmodel. Otherwise it's variables are not in the graph when you call restore - so they won't be restored.

I finally figured it out. According to the Saving and Restoring a trained LSTM in Tensor Flow and the answer of Jeronimo Garcia-Loygorri, i move the creation of Saver after the LSTM model's definition, and then every problem gone!

Related

Tensorflow use trained model to make predictions initialization error

I've trained a model on some data (just a simple classification task). After, I wish to use this same model to run some predictions via a separate function make_prediction().
So currently my main file is simply something like :
agent.train(data)
agent.make_predictions(new_data)
and tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
I don't initialize the variables in my second function so that session is different to the previous but it is surprising to me that I can't simply reopen a previous session. Do I need to checkpoint the model after training and then reload it each time?
Thanks a lot

Tensorflow save/restore batch norm

I trained a model with batch norm in Tensorflow. I would like to save the model and restore it for further using. The batch norm is done by
def batch_norm(input, phase):
return tf.layers.batch_normalization(input, training=phase)
where the phase is True during training and False during testing.
It seems like simply calling
saver = tf.train.Saver()
saver.save(sess, savedir + "ckpt")
would not work well because when I restore the model it first says restored successfully. It also says Attempting to use uninitialized value batch_normalization_585/beta if I just run one node in the graph. Is this related to not saving the model properly or something else that I've missed?
I also had the "Attempting to use uninitialized value batch_normalization_585/beta" error. This comes from the fact that by declaring the saver with the empty brackets like this:
saver = tf.train.Saver()
The saver will save the variables contained in tf.trainable_variables() which do not contain the moving average of the batch normalization. To include this variables into the saved ckpt you need to do:
saver = tf.train.Saver(tf.global_variables())
Which saves ALL the variables, so it is very memory consuming. Or you must identify the variables that have moving avg or variance and save them by declaring them like:
saver = tf.train.Saver(tf.trainable_variables() + list_of_extra_variables)
Not sure if this needs to be explained, but just in case (and for other potential viewers).
Whenever you create an operation in TensorFlow, a new node is added to the graph. No two nodes in a graph can have the same name. You can define the name of any node you create, but if you don't give a name, TensorFlow will pick one for you in a deterministic way (that is, not randomly, but instead always with the same sequence). If you add two numbers, it will probably be Add, but if you do another addition, since no two nodes can have the same name, it may be something like Add_2. Once a node is created in a graph its name cannot be changed. Many functions create several subnodes in turn; for example, tf.layers.batch_normalization creates some internal variables beta and gamma.
Saving and restoring works in the following way:
You create a graph representing the model that you want. This graph contains the variables that will be saved by the saver.
You initialize, train or do whatever you want with that graph, and the variables in the model get assigned some values.
You call save on the saver to, well, save the values of the variables to a file.
Now you recreate the model in a different graph (it can be a different Python session altogether or just another graph coexisting with the first one). The model must be created in exactly the same way the first one was.
You call restore on the saver to retrieve the values of the variables.
In order for this to work, the names of the variables in the first and the second graph must be exactly the same.
In your example, TensorFlow is complaining about the variable batch_normalization_585/beta. It seems that you have called tf.layers.batch_normalization nearly 600 times in the same graph, so you have that many beta variables hanging around. I doubt that you actually need that many, so I guess you are just experimenting with the API and ended up with that many copies.
Here's a draft of something that should work:
import tensorflow as tf
def make_model():
input = tf.placeholder(...)
phase = tf.placeholder(...)
input_norm = tf.layers.batch_normalization(input, training=phase))
# Do some operations with input_norm
output = ...
saver = tf.train.Saver()
return input, output, phase, saver
# We work with one graph first
g1 = tf.Graph()
with g1.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
# Do your training or whatever...
saver.save(sess, savedir + "ckpt")
# We work with a second different graph now
g2 = tf.Graph()
with g2.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
saver.restore(sess, savedir + "ckpt")
# Continue using your model...
Again, the typical case is not to have two graphs side by side, but rather have one graph and then recreate it in another Python session later, but in the end both things are the same. The important part is that the model is created in the same way (and therefore with the same node names) in both cases.

How to get the output of a maxpool layer in a pre-trained model in TensorFlow?

I have a model that I trained. I wish to extract from the model the output of an intermediate maxpool layer.
I tried the following
saver = tf.train.import_meta_graph(BASE_DIR + LOG_DIR + '/model.ckpt.meta')
saver.restore(sess,tf.train.latest_checkpoint(BASE_DIR + LOG_DIR))
sess.run("maxpool/maxpool",feed_dict=feed_dict)
here, feed_dict contains the placeholders and their contents for this run in a dictionary.
I keep getting the following error
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_1_1' with dtype float and shape...
what can be the cause of this? I generated all of the placeholders and input them in the feed dictionary.
I ran in to a similar issue and it was frustrating. What got me around it was filling out the name field for every variable and operation that I wanted to call later. You also may need to add your maxpool/maxpool op to a collection with tf.add_to_collection('name_for_maxpool_op', maxpool_op_handle). You can then restore the ops and named tensors with:
# Restore from metagraph.
saver = tf.train.import_meta_graph(...)
sess = tf.Session()
saver = restore(sess, ...)
graph = sess.graph
# Restore your ops and tensors.
maxpool_op = tf.get_collection('name_for_maxpool_op')[0] # returns a list, you want the first element
a_tensor = graph.get_tensor_by_name('tensor_name:0') # need the :0 added to your name
Then you would build your feed_dict using your restored tensors. More information can be found here. Also, as you mentioned in your comment, you need to pass the op itself to sess.run, not it's name:
sess.run(maxpool_op, feed_dict=feed_dict)
You can access your tensors and ops from a restored metagraph even if you did not name them (to avoid retraining the model with new fancy tensor names, for instance), but it can be a bit of a pain. The names given to the tensors automatically are not always the most transparent. You can list the names of all variables in your graph with:
print([v.name for v in tf.all_variables()])
You can hopefully find the name that you are looking for there and then restore that tensor using graph.get_tensor_by_name as described above.

Tensorflow: Finetune pretrained model on new dataset with different number of classes

How can I finetune a pretrained model in tensorflow on a new dataset? In Caffe I can simply rename the last layer and set some parameters for random initialization. Is something similar possible in tensorflow?
Say I have a checkpoint file (deeplab_resnet.ckpt) and some code that sets up the computational graph in which I can modify the last layer such that it has the same number of ouputs as the new dataset has classes.
Then I try to start the session like this:
sess = tf.Session(config=config)
init = tf.initialize_all_variables()
sess.run(init)
trainable = tf.trainable_variables()
saver = tf.train.Saver(var_list=trainable, max_to_keep=40)
saver.restore(sess, 'ckpt_path/deeplab_resnet.ckpt')
However this gives me an error when calling the saver.restore function since it expects the exact same graph structure as the the one it was saved from.
How can I only load all weights except for the last layer from the 'ckpt_path/deeplab_resnet.ckpt' file?
I also tried changing the Classification layer name but no luck there either...
I'm using the tensorflow-deeplab-resnet model
You can specify the names of the variables that you want to restore.
So, you can get a list of all of the variables in the model and filter out the variables of the last layer:
all_vars = tf.all_variables()
var_to_restore = [v for v in all_vars if not v.name.startswith('xxx')]
saver = tf.train.Saver(var_to_restore)
See the documentation for the details.
Alternatively, you can try to load the whole model an create a new "branch" out of the layer before the last and use it in the cost function during the training.

Tensorflow: Saving and restoring the model parameters

I am a beginner in TensorFlow, currently training a CNN.
I am using Saver in order to save the parameters used by the model, but I am having concerns whether this would itself store all the Variables used by the model, and is sufficient to restore the values to re-run the program for performing classification/testing on the trained network.
Let us look at the famous example MNIST given by TensorFlow.
In the example, we have bunch of Convolutional blocks, all of which have weight, and bias variables that gets initialised when the program is run.
W_conv1 = init_weight([5,5,1,32])
b_conv1 = init_bias([32])
After having processed several layers, we create a session, and initialise all the variables added to the graph.
sess = tf.Session()
sess.run(tf.initialize_all_variables())
saver = tf.train.Saver()
Here, is it possible to comment the saver.save code, and replace it by saver.restore(sess,file_path) after the training, in order to restore the weight, bias, etc., parameters back to the graph? Is this how it should be ?
for i in range(1000):
...
if i%500 == 0:
saver.save(sess,"model%d.cpkt"%(i))
I am currently training on large dataset, so terminating, and restarting the training is a waste of time, and resources so I request someone to please clarify before the I start the training.
If you want to save the final result only once, you can do this:
with tf.Session() as sess:
for i in range(1000):
...
path = saver.save(sess, "model.ckpt") # out of the loop
print "Saved:", path
In other programs, you can load the model using the path returned from saver.save for prediction or something. You can see some examples at https://github.com/sugyan/tensorflow-mnist.
Based on the explanation in here and Sung Kim solution I wrote a very simple model exactly for this problem. Basically in this way you need to create an object from the same class and restore its variables from the saver. You can find an example of this solution here.