I am wondering what exactly is saved when I use a tf.train.Saver() to save my model after every training epoch. The file seems kind of large compared to what I am used to with Keras models. Right now my RNN takes up 900 MB at each save. Is there any way to tell the saver to only save the trainable parameters? I would also like a way to save only part of the model. I know I can just get the variables I define and save them using the numpy format but when I use the RNN classes I don't directly have access to their weights and I looked through the code and there is nothing like get_weights that I can see.
You can provide a list of variables to save in the Saver constructor, ie saver=tf.train.Saver(var_list=tf.trainable_variables())
It will save all variables._all_saveable_objects() by default, if Saver does not specify var_list.
That is, Saver will save all global variables and saveable variables by default.
def _all_saveable_objects():
"""Returns all variables and `SaveableObject`s that must be checkpointed.
Returns:
A list of `Variable` and `SaveableObject` to be checkpointed
"""
# TODO(andreasst): make this function public once things are settled.
return (ops.get_collection(ops.GraphKeys.GLOBAL_VARIABLES) +
ops.get_collection(ops.GraphKeys.SAVEABLE_OBJECTS))
Related
So I have an old model with tensorflow 1.x code and it includes too much stuff I don't need, all I need is just the model and I created the model in a way I'm almost certain is identical to the previous one (I checked a bunch of stuff)
I have the .data and .index and a .meta file and I tried very many different types of things and either it says that "a few things weren't saved" and then lists all of the weights (but not really the entire thing, cause when the weights are too big it just adds three dots (...) )
I would LOVE to have someone tell me how I can use that in my new model
I tried:
model.load_weights
I tried:
tf.compat.v1.disable_eager_execution()
sess = tf.compat.v1.Session()
saver = tf.compat.v1.train.import_meta_graph('checkpoints/pix2pix-60.meta')
saver.restore( "checkpoints/pix2pix-60")
I tried:
tf.compat.v1.disable_eager_execution()
sess = tf.compat.v1.Session()
saver = tf.compat.v1.train.Checkpoint(model=gen)
saver.restore(tf.train.latest_checkpoint('checkpoints')).assert_consumed()
I tried:
ck_path = tf.train.latest_checkpoint('checkpoints')
gen.load_weights(ck_path)
I tried:
from tensorflow.python.training import checkpoint_utils as cp
ckpt = cp.load_checkpoint('checkpoints/pix2pix--60')
and then tried to see what I can do with that
and I think I tried honestly a bunch of more stuff
I honestly won't mind if someone can even just tell me how I can read the .index or .data files so that I can just copy the weights and from there I can deal with it
I would again really love some help,
Thanks!
It seems that your TF1.x model is saved as a ckpt format, and to restore a ckpt model, you need get the graph before load weight.
To convert it to TF2.x model, you may instantiate the original model, then save it as like recommended saved_model format use 2.x api.
Your can continue your second trying, use compat v1 to instantiate a default Session, then load graph from meta file, then load weight, after this, your Session will contain your graph and loaded weights.
To convert to 2.x model, you need get the inputs and outputs tensors from graph:
# you have loaded graph and weight into sess
sess.as_default()
g = sess.graph
# assuming that your input output names are "input:0", "output:0"
input_tensor = g.get_tensor_by_name("input:0")
output_tensor = g.get_tensor_by_name("output:0")
# then use tf2.x to save a saved_model format model
model = tf.keras.Model(input_tensor, output_tensor, name="tf2_model")
model.save("your_saved_dir")
A saved_model format model stores all graph and weight, you can simply use
model = tf.saved_model.load("your_model_dir")
to instantiate model for using.
Ok, So I think I figured it out although it was quite tedious
In the model in tensorflow 1.x all variables were created with tf.name_scope and in tensorflow 2.x there is no such thing so the variable names were unmatched and so I pretty much had to kind of manually change the names so they would fit and then it really did upload the weights as such:
checkpoint = tf.train.Checkpoint(model=gen)
checkpoint.restore('checkpoints/pix2pix--60').assert_consumed()
this also seemed to work:
gen.load_weights('checkpoints/pix2pix--60')
however something is still not working correctly since the output is actually not what I am expecting (what the output is like in the tensorflow 1.x model)
It may have something to do with the batch_normalization weights that aren't being loaded but I checked and in my current tf 2.x model they are untrainable and are equal to exactly the weights that aren't being loaded
Another weird thing is that when I do gen.predict(x) it gives me a different outcome each time, so I guess the weights aren't being frozen or something...
So I have yet to understand what went wrong previously, but I do know that there have been many changes in the API of tf2 from tf1 including default parameters and more so what I eventually did which worked perfectly was this:
tf_upgrade_v2
--intree my_project/
--outtree my_project_v2/
--reportfile report.txt
as explained here
you just put all the pieces of code you want to change in folder my_project and it creates a folder named myproject_v2 with the tf1 code converted to tf2
I have trained a custom neural network with the function:
tf.estimator.train_and_evaluate
After correct training, it contains the following files:
checkpoint
events.out.tfevents.1538489166.ti
model.ckpt-0.data-00000-of-00002
model.ckpt-0.index
model.ckpt-10.data-00000-of-00002
model.ckpt-10.index eval
graph.pbtxt
model.ckpt-0.data-00001-of-00002
model.ckpt-0.meta
model.ckpt-10.data-00001-of-00002
model.ckpt-10.meta
Now I need to export the weights and biases of every layer, into a raw data structure, e.g. an array, numpy.
I have read multiple pages on TensorFlow, and on other topics, but neither can find this question. The first thing I would assume to put the fils together into graph.pd with the freeze.py as suggested here:
Tensorflow: How to convert .meta, .data and .index model files into one graph.pb file
But then still the main question is unsolved.
If you wish to evaluate tensors alone, you can check out this question. But if you wish to e.g. deploy your network, you can take a look at TensorFlow serving, which is probably the most performant one right now. Or if you want to export this network to other frameworks and use them there, you can actually use ONNX for this purpose.
If saving weights and biases in a numpy array is your strict requirement, you can follow this example:
# In a TF shell, define all requirements and call the model function
y = model(x, is_training=False, reuse=tf.AUTO_REUSE) # For example
Once you call this function, you can see all the variables in the graph by running
tf.global_variables()
You need to restore all these variables from the latest checkpoint (say ckpt_dir) and then execute each of these variables to get the latest values.
checkpoint = tf.train.latest_checkpoint('./model_dir/')
fine_tune = tf.contrib.slim.assign_from_checkpoint_fn(checkpoint,
tf.global_variables(),
ignore_missing_vars=True)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
gv = sess.run(tf.global_variables())
Now gv will be a list of all the values of your variables (weights and biases); You can access any individual component via indexing - gv[5] etc. Or you can convert the entire thing into an array and save using numpy.
np.save('my_weights', np.array(gv))
This will save all your weights and biases in your current working directory as a numpy array - 'my_weights.npy'.
Hope this helps.
I have a network with weights filled by manual tf.assign, and now I want to save the network with the weight values but without the placeholder inputs. It seems tf.train.Saver works only when I have the feed_dict available, and tf.train.export_meta_graph only saves the network structure. I tried pickle and dill but they both have errors. Are there any better solutions for this kind of saving?
Placeholders convert the input data into Tensors so I guess they are an important part of the Graph and I don't understand why you don't want to include them.
Even if you use tf.assign, you can freeze the graph, which means combining the structure with the weights. What freezing does is to convert Tensorflow variables into constants.
You have to save the structure of your graph:
gdef = g.as_graph_def()
tf.train.write_graph(gdef,".","graph.pb",False)
Then save the weights (after training)
saver.save(sess, 'tmp/my-weights')
And freeze the graph according to the tutorial in https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite
After that, you can use the Graph.
I trained a model with batch norm in Tensorflow. I would like to save the model and restore it for further using. The batch norm is done by
def batch_norm(input, phase):
return tf.layers.batch_normalization(input, training=phase)
where the phase is True during training and False during testing.
It seems like simply calling
saver = tf.train.Saver()
saver.save(sess, savedir + "ckpt")
would not work well because when I restore the model it first says restored successfully. It also says Attempting to use uninitialized value batch_normalization_585/beta if I just run one node in the graph. Is this related to not saving the model properly or something else that I've missed?
I also had the "Attempting to use uninitialized value batch_normalization_585/beta" error. This comes from the fact that by declaring the saver with the empty brackets like this:
saver = tf.train.Saver()
The saver will save the variables contained in tf.trainable_variables() which do not contain the moving average of the batch normalization. To include this variables into the saved ckpt you need to do:
saver = tf.train.Saver(tf.global_variables())
Which saves ALL the variables, so it is very memory consuming. Or you must identify the variables that have moving avg or variance and save them by declaring them like:
saver = tf.train.Saver(tf.trainable_variables() + list_of_extra_variables)
Not sure if this needs to be explained, but just in case (and for other potential viewers).
Whenever you create an operation in TensorFlow, a new node is added to the graph. No two nodes in a graph can have the same name. You can define the name of any node you create, but if you don't give a name, TensorFlow will pick one for you in a deterministic way (that is, not randomly, but instead always with the same sequence). If you add two numbers, it will probably be Add, but if you do another addition, since no two nodes can have the same name, it may be something like Add_2. Once a node is created in a graph its name cannot be changed. Many functions create several subnodes in turn; for example, tf.layers.batch_normalization creates some internal variables beta and gamma.
Saving and restoring works in the following way:
You create a graph representing the model that you want. This graph contains the variables that will be saved by the saver.
You initialize, train or do whatever you want with that graph, and the variables in the model get assigned some values.
You call save on the saver to, well, save the values of the variables to a file.
Now you recreate the model in a different graph (it can be a different Python session altogether or just another graph coexisting with the first one). The model must be created in exactly the same way the first one was.
You call restore on the saver to retrieve the values of the variables.
In order for this to work, the names of the variables in the first and the second graph must be exactly the same.
In your example, TensorFlow is complaining about the variable batch_normalization_585/beta. It seems that you have called tf.layers.batch_normalization nearly 600 times in the same graph, so you have that many beta variables hanging around. I doubt that you actually need that many, so I guess you are just experimenting with the API and ended up with that many copies.
Here's a draft of something that should work:
import tensorflow as tf
def make_model():
input = tf.placeholder(...)
phase = tf.placeholder(...)
input_norm = tf.layers.batch_normalization(input, training=phase))
# Do some operations with input_norm
output = ...
saver = tf.train.Saver()
return input, output, phase, saver
# We work with one graph first
g1 = tf.Graph()
with g1.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
# Do your training or whatever...
saver.save(sess, savedir + "ckpt")
# We work with a second different graph now
g2 = tf.Graph()
with g2.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
saver.restore(sess, savedir + "ckpt")
# Continue using your model...
Again, the typical case is not to have two graphs side by side, but rather have one graph and then recreate it in another Python session later, but in the end both things are the same. The important part is that the model is created in the same way (and therefore with the same node names) in both cases.
I do some training in Tensorflow and save the whole session using a saver:
# ... define model
# add a saver
saver = tf.train.Saver()
# ... run a session
# ....
# save the model
save_path = saver.save(sess,fileSaver)
It works fine, and I can successfully restore the whole session by using the exact same model and calling:
saver.restore(sess, importSaverPath)
Now I want to modify only the optimizer while keeping the rest of the model constant (the computation graph stays the same apart from the optimizer):
# optimizer used before
# optimizer = tf.train.AdamOptimizer
# (learning_rate = learningRate).minimize(costPrediction)
# the new optimizer I want to use
optimizer = tf.train.RMSPropOptimizer
(learning_rate = learningRate, decay = 0.9, momentum = 0.1,
epsilon = 1e-5).minimize(costPrediction)
I also want to continue the training from the last graph state I saved (i.e., I want to restore the state of my variables and continue with another training algorithm). Of course I cannot use:
saver.restore
any longer, because the graph has changed.
So my question is: is there a way to restore only variables using the saver.restore command (or even, maybe for later use, only a subset of variables), when the whole session has been saved? I looked for such feature in the API documentation and online, but could not find any example / detailed enough explanations that could help me get it to work.
It is possible to restore a subset of variables by passing the list of variables as the var_list argument to the Saver constructor. However, when you change the optimizer, additional variables may have been created (momentum accumulators, for instance) and variable associated with the previous optimizer, if any, would have been removed from the model. So simply using the old Saver object to restore will not work, especially if you had constructed it with the default constructor, which uses tf.all_variables as the argument to var_list parameter. You have to construct the Saver object on the subset of variables that you created in your model and then restore would work. Note that, this would leave the new variables created by the new optimizer uninitialized, so you have to explicitly initialize them.
I see the same problem. Inspired by keveman' s answer. My solution is:
Define your new graph, (here only new optimizer related variables are different from the old graph).
Get all variables using tf.global_variables(). This return a var list I called g_vars.
Get all optimizer related variables using tf.contrib.framework.get_variables_by_suffix('some variable filter'). The filter may be RMSProp or RMSPRrop_*. This function returns a var list I called exclude_vars.
Get the variables in g_vars but not in exclude_vars. Simply use
vars = [item for item in g_vars if item not in exclude_vars]
these vars are common vars in both new and old graph, which you can restore from old model now.
you could recover the original Saver from a MetaGraph protobuf first and then use that saver to restore all old variables safely. For a concrete example, you can take a look at the eval.py script: TensorFlow: How do I release a model without source code?