I've been using TensorFlow's Saver class to save model parameters, but that class is going away in TensorFlow 2, so I need to replace it with Checkpoint. I can't figure out how to do that. All the examples in the documentation for Checkpoint assume you're saving a tf.keras.Model. I'm not using Keras, so that doesn't apply.
Saver just takes a list of variables to save, so that's what I'm starting from. How do I pass that to Checkpoint? It expects you to pass every checkpointable object as a named argument. I was hoping I could just say variables=[var1, var2, ...], but it doesn't accept lists. I could pass every variable as a separate argument, but what do I use as the names? The variable names? That defeats the whole purpose of checkpoint, which is to be more robust by not depending on variable names. What is the intended way of writing checkpoints in code that doesn't use Keras?
Related
I use this code to restore my model, but I don't know how to predict after restoring it, which function can I use? I'm a beginner in tensorflow, I have no idea to which parameters or function will be saved.
In the meta model:
sess = tf.Session()
saver = tf.train.import_meta_graph("/home/MachineLearning/model.ckpt.meta")
saver.restore(sess,tf.train.latest_checkpoint('./'))
print("Model restored with success ")
x_predict,y_predict= load_svmlight_file('/MachineLearning/to_predict.csv')
x_predict = x_valid.toarray()
sess.run([] ,feed_dict ) #i don't know how to use predict function
These are the results:
$python predict.py
Model restored with success
Traceback (most recent call last):
File "predict.py", line 23, in <module>
sess.run([] ,feed_dict )
NameError: name 'feed_dict' is not defined
You're almost there. Tensorflow is simply a math library. Your graph is a collection of math operations with the associated dependencies (e.g. a graph, DAG specifically).
When you loaded the graph and associated variables (weights) you loaded all the definitions. Now you need to ask tensorflow to compute some value in the graph. There are lots of values it could compute, the one you want is often named logits (a typical name for the output layer of a neural network). But note that it could be named anything (especially if this isn't a neural network model), you need to understand the model. You might also want to compute an operation named accuracy which is defined to compute the accuracy of a particular batch of inputs (again depends on your model).
Note that you will need to provide tensorflow with whatever it needs to perform these computations. There is generally a placeholder where you pass in your data (and during training a placeholder for your labels which you don't need for prediction because none of the operations you will ask tensorflow to compute depend on it).
But you will need to get references to these various operations (logits, and accuracy) and placeholders (x is a typical name). Since you loaded your graph from disk you don't have the references (note that an alternative way of loading the model is to re-run the code that builds the model, which gives you easy access to the references you need).
In order to get the right references you can look them up by name. Here's how you would get a list of all the operations:
List of tensor names in graph in Tensorflow
Then to get a specific OP (operation) by name:
How to get a tensorflow op by name?
So you'll have something like this:
logits = tf.get_default_graph().get_operation_by_name("logits:0")
x = tf.get_default_graph().get_operation_by_name("x:0")
accuracy = tf.get_default_graph().get_operation_by_name("accuracy:0")
Note that the :0 is an index added to all names in tensorflow to avoid duplicate names. Now you have all the references you need and you can use sess.run to perform a specific computation, providing the input data, and OPs you'd like to have computed:
sess.run([logits, accuracy], feed_dict={x:your_input_data_in_numpy_format})
The names of these elements will vary in your implementation, I've used the most common names. If they weren't given pretty names it'll be hard to identify them and you'll need to look through the original code that produced the graph. In fact if they weren't named properly looking them up by name is so painful that it's probably better to just re-run the code that produced the original graph rather than import the meta graph. Notice that saver.restore only restores the actual data, import_meta_graph is the optional piece which can be replaced by simply re-building the graph programmatically.
In Tensorflow's tutorial it says that there are two ways to use the EMA'ed weights for evaluation
Build a model that uses the shadow variables instead of the
variables. For this, use the average() method which returns the
shadow variable for a given variable.
Build a model normally but load the checkpoint files to evaluate by
using the shadow variable names. For this use the average_name()
method. See the Saver class for more information on restoring saved
variables.
I understand how to use the second method to use the EMA'ed weights for evaluation as an example is given. I was wondering if someone could give me a simple example of how to build a model that uses the shadow variables.
I am wondering what exactly is saved when I use a tf.train.Saver() to save my model after every training epoch. The file seems kind of large compared to what I am used to with Keras models. Right now my RNN takes up 900 MB at each save. Is there any way to tell the saver to only save the trainable parameters? I would also like a way to save only part of the model. I know I can just get the variables I define and save them using the numpy format but when I use the RNN classes I don't directly have access to their weights and I looked through the code and there is nothing like get_weights that I can see.
You can provide a list of variables to save in the Saver constructor, ie saver=tf.train.Saver(var_list=tf.trainable_variables())
It will save all variables._all_saveable_objects() by default, if Saver does not specify var_list.
That is, Saver will save all global variables and saveable variables by default.
def _all_saveable_objects():
"""Returns all variables and `SaveableObject`s that must be checkpointed.
Returns:
A list of `Variable` and `SaveableObject` to be checkpointed
"""
# TODO(andreasst): make this function public once things are settled.
return (ops.get_collection(ops.GraphKeys.GLOBAL_VARIABLES) +
ops.get_collection(ops.GraphKeys.SAVEABLE_OBJECTS))
I do some training in Tensorflow and save the whole session using a saver:
# ... define model
# add a saver
saver = tf.train.Saver()
# ... run a session
# ....
# save the model
save_path = saver.save(sess,fileSaver)
It works fine, and I can successfully restore the whole session by using the exact same model and calling:
saver.restore(sess, importSaverPath)
Now I want to modify only the optimizer while keeping the rest of the model constant (the computation graph stays the same apart from the optimizer):
# optimizer used before
# optimizer = tf.train.AdamOptimizer
# (learning_rate = learningRate).minimize(costPrediction)
# the new optimizer I want to use
optimizer = tf.train.RMSPropOptimizer
(learning_rate = learningRate, decay = 0.9, momentum = 0.1,
epsilon = 1e-5).minimize(costPrediction)
I also want to continue the training from the last graph state I saved (i.e., I want to restore the state of my variables and continue with another training algorithm). Of course I cannot use:
saver.restore
any longer, because the graph has changed.
So my question is: is there a way to restore only variables using the saver.restore command (or even, maybe for later use, only a subset of variables), when the whole session has been saved? I looked for such feature in the API documentation and online, but could not find any example / detailed enough explanations that could help me get it to work.
It is possible to restore a subset of variables by passing the list of variables as the var_list argument to the Saver constructor. However, when you change the optimizer, additional variables may have been created (momentum accumulators, for instance) and variable associated with the previous optimizer, if any, would have been removed from the model. So simply using the old Saver object to restore will not work, especially if you had constructed it with the default constructor, which uses tf.all_variables as the argument to var_list parameter. You have to construct the Saver object on the subset of variables that you created in your model and then restore would work. Note that, this would leave the new variables created by the new optimizer uninitialized, so you have to explicitly initialize them.
I see the same problem. Inspired by keveman' s answer. My solution is:
Define your new graph, (here only new optimizer related variables are different from the old graph).
Get all variables using tf.global_variables(). This return a var list I called g_vars.
Get all optimizer related variables using tf.contrib.framework.get_variables_by_suffix('some variable filter'). The filter may be RMSProp or RMSPRrop_*. This function returns a var list I called exclude_vars.
Get the variables in g_vars but not in exclude_vars. Simply use
vars = [item for item in g_vars if item not in exclude_vars]
these vars are common vars in both new and old graph, which you can restore from old model now.
you could recover the original Saver from a MetaGraph protobuf first and then use that saver to restore all old variables safely. For a concrete example, you can take a look at the eval.py script: TensorFlow: How do I release a model without source code?
I have built and trained a model. On second phase I want to replace the last two layers and retrain them using different data.
I constantly get the errors for not initializing variables even though I did run initialization on the new vars:
var_init_op = tf.initialize_variables(var_list=[fc1_weights, fc1_biases, fc2_weights, fc2_biases])
sess.run(var_init_op)
I understand I have to initialize the new optimizer (ADAMSolever) as well, but
not sure how to do that.
Assuming I want to replace the optimizer (and other variables) in the middle how do I initialize it without trashing already trained variables?
You can get all the trainable variables using tf.trainable_variables(), and exclude the variables which should be restored from the pretrained model. Then you can initialize the other variables.