Incremental training using TensorFlow - tensorflow

I want to train a model to classify 90K labels, so I used the so called incremental training.
I initially train the model to classify only 1K labels, then add another 1K labels and expand the final FC layer's output dimension to 2K, and train for some more epochs. After that I add another 1K labels, and so on...
Note that it is NOT fine-tune, in which ALL parameters before the last FC are fixed, so I can cache the output features. In my case I need to update all variables in every stage.
The solution I designed is:
train for 1K labels.
save the model.
modify the graph to let the last FC layer output 2K dimension.
initialize all variables
load the previous checkpoint, which will override all parameters, but the last layer's weights.
train again and repeat
So the key point here is to realize partial restore checkpoints.
In TensorFlow, I use such code to load a checkpoint:
saver.restore(sess, "model.ckpt")
However, it fails when there is shape mismatch.
Could anyone help, either in how to partially restore/initialize variables, or how to implement incremental training in another way?

This is currently not simple to do. We are actively adding new APIs to make it easier.
In the meantime, if you are really determined, :), you can try the following when you change the FC layer's size:
Create a reader:
reader = tf.train.NewCheckpointReader(your_checkpoint_file)
Load all the variables in the checkpoint file:
cur_vars = reader.get_variable_to_shape_map().keys()
Remove the original FC layer:
cur_vars_without_fc = cur_vars - your_fc_layer_var_name
Create a saver with these variables:
saver = tf.Saver(cur_vars_without_fc)
saver.restore(sess, your_checkpoint_file)
Initialize your new FC layer's variables:
sess.run([your_fc_layer_var.initializer])
Hope that helps!
Sherry

Related

Tensorflow: fine tuning a model with additional layers, key not found error

So, I have an architecture where let's say I have several layers and let us denote it by Arc1: A1->A2->A3->A4->Loss1. I trained this architecture with a loss function: Loss1.
I have a new architecture Arc2: A1->A2->A3->A4->A5->A6->Loss2, where A1 to A4 in both layers have same name and A5 and A6 are new layers. I want to train the whole architecture Arc2 slowly with lower learning rate but want to restore the A1 to A4 from previous trained architecture Arc1. I tried the implementation in tensorflow but I get the error :
tensorflow/core/framework/op_kernel.cc:1152] Not found: Key Arc/new_layers/A5_weights not found in checkpoint
[[Node: save/RestoreV2_38 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_38/tensor_names, save/RestoreV2_38/shape_and_slices)]]
It is true that the weights corresponding to new layers are not in old checkpoint, but as this is a standard technique, what would be a way to do this? As far as I know, freezing a network will not solve this, as I want the gradient to propogate the all the way to A1 (while learning with lower learning rate later).
It is a standard technique to train a new softmax layer, but if you have 2 or more new fully connected layers or other layers, should it be really problematic?
Assuming you are restoring variables using a tf.train.Saver, you need to specify the variables you want to restore in the constructor (__init__), otherwise it will default to all variables in the current graph.
saver = tf.train.Saver(old_vars)
with tf.Session() as sess:
sess.run(tf.variables_initializer(new_vars)
saver.restore(sess, save_path)

Tensorflow load pre-trained model use different optimizer

I want to load a pre-trained model (optimized by AdadeltaOptimizer) and continue training with SGD (GradientDescentOptimizer). The models are saved and loaded with tensorlayer API:
save model:
import tensorlayer as tl
tl.files.save_npz(network.all_params,
name=model_dir + "model-%d.npz" % global_step)
load model:
load_params = tl.files.load_npz(path=resume_dir + '/', name=model_name)
tl.files.assign_params(sess, load_params, network)
If I continue training with adadelta, the training loss (cross entropy) looks normal (start at a close value as the loaded model). However, if I change the optimizer to SGD, the training loss would be as large as a newly initialized model.
I took a look at the model-xxx.npz file from tl.files.save_npz. It only saves all model parameters as ndarray. I'm not sure how the optimizer or learning rate is involved here.
You probably would have to import the tensor into a variable which is the loss function/cross-entropy that feeds into your Adam Optimizer previously. Now, just feed it through your SGD optimizer instead.
saver = tf.train.import_meta_graph('filename.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
cross_entropy = graph.get_tensor_by_name("entropy:0") #Tensor to import
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
In this case, I have tagged the cross-entropy Tensor before training my pre-train model with the name entropy, as such
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv), name = 'entropy')
If you are unable to make changes to your pretrain model, you can obtain the list of Tensors in your model(after you have imported it) from graph and deduce which Tensor you require. I have no experience with Tensorlayer, so this guide is to provide more of an understanding. You can take a look at Tensorlayer-Layers, they should explain how to obtain your Tensor. As Tensorlayer is built on top of Tensorflow, most of the functions should still be available.
You can specify the parameters you want to save in your checkpoint file.
save_npz([save_list, name, sess])
In the save_list you're specifying only the network parameters that don't contain the optimizer parameters, thus no learning rate or any other optimizer parameters.
If you want to save the current learning rate (in order to use the same exact learning rate when you restore the model) you have to add it to the save_list, like that:
save_npz(network.all_params.extend([learning_rate])
(I suppoose that all_params is an array, I guess my supposition is correct.
Since you want to change the optimizer, I suggest you save the learning_rate only as optimizer parameter and not any other variable that the optimizer creates.
In that way, you'll be able to change the optimizer and restoring the model, otherwise (if you put in your checkpoint any other variable) the graph you'll try to restore won't find the variables in which place the saved value and you won't be able to change it.
https://tensorlayer.readthedocs.io/en/latest/user/get_start_advance.html#pre-trained-cnn
vgg = tl.models.vgg16(pretrained=True)
img = tl.vis.read_image('data/tiger.jpeg')
img = tl.prepro.imresize(img, (224, 224)).astype(np.float32) / 255
output = vgg(img, is_train=False)
For 2.0 version, use this

Can I retrain an old model with new data using TensorFlow?

I am new to TensorFlow and I am just trying to see if my idea is even possible.
I have trained a model with multi class classifier. Now I can classify a sentence in input, but I would like to change the result of CNN, for example, to improve the score of classification or change the classification.
I want to try to train just a single sentence with its class on a trained model, is this possible?
If I understand your question correctly, you are trying to reload a previously trained model either to run it through further iterations, test it on a new sentence, or fine tune the model a bit. If this is the case, yes you can do this. Look into saving and restoring models (https://www.tensorflow.org/api_guides/python/state_ops#Saving_and_Restoring_Variables).
To give you a rough outline, when you initially train your model, after setting up the network architecture, set up a saver:
trainable_var = tf.trainable_variables()
sess = tf.Session()
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer
# Run/train your model until some completion criteria is reached
#....
#....
saver.save(sess, 'model.ckpt')
Now, to reload your model:
saver = tf.train.import_meta_graph('model.ckpt.meta')
saver.restore('model.ckpt')
#Note: if you have already defined all variables before restoring the model, import_meta_graph is not necessary
This will give you access to all the trained variables and you can now feed in whatever new sentence you have. Hope this helps.

Tensorflow: Finetune pretrained model on new dataset with different number of classes

How can I finetune a pretrained model in tensorflow on a new dataset? In Caffe I can simply rename the last layer and set some parameters for random initialization. Is something similar possible in tensorflow?
Say I have a checkpoint file (deeplab_resnet.ckpt) and some code that sets up the computational graph in which I can modify the last layer such that it has the same number of ouputs as the new dataset has classes.
Then I try to start the session like this:
sess = tf.Session(config=config)
init = tf.initialize_all_variables()
sess.run(init)
trainable = tf.trainable_variables()
saver = tf.train.Saver(var_list=trainable, max_to_keep=40)
saver.restore(sess, 'ckpt_path/deeplab_resnet.ckpt')
However this gives me an error when calling the saver.restore function since it expects the exact same graph structure as the the one it was saved from.
How can I only load all weights except for the last layer from the 'ckpt_path/deeplab_resnet.ckpt' file?
I also tried changing the Classification layer name but no luck there either...
I'm using the tensorflow-deeplab-resnet model
You can specify the names of the variables that you want to restore.
So, you can get a list of all of the variables in the model and filter out the variables of the last layer:
all_vars = tf.all_variables()
var_to_restore = [v for v in all_vars if not v.name.startswith('xxx')]
saver = tf.train.Saver(var_to_restore)
See the documentation for the details.
Alternatively, you can try to load the whole model an create a new "branch" out of the layer before the last and use it in the cost function during the training.

What is the best way to run saved model with different batch size in TensorFlow?

I trained Cifar10 example model from TensorFlow's repository with batch_size 128 and it worked fine. Then I froze graph and managed to run it with C++ just like they do it in their C++ label image example.
The only problem was that I had to artificially generate tensor of shape [128, image_height, image_width, channels] to classify single image with C++ because saved model expects input of 128 samples in a batch since that is number of samples that comes from queue.
I tried training Cifar10 example with batch_size = 1 and then I managed to classify examples one by one when I run model with C++, but that doesn't seem like a great solution. I also tried manually changing tensor shapes in saved graph file but it didn't work.
My question is what is the best way to train model with fixed batch size (like 32, 64, 128 etc.) and then save model so that it can be used with batch size of arbitrary length? If that's not possible, then how to save model to be able to classify samples one by one.
It sounds like the problem is that TensorFlow is "baking in" the batch size to other tensors in the graph (e.g. if the graph contains tf.shape(t) for some tensor t whose shape depends on the batch size, the batch size might be stored in the graph as a constant). The solution is to change your program slightly so that tf.train.batch() returns tensors with a variable batch size.
The tf.train.batch() method accepts a tf.Tensor for the batch_size argument. Perhaps the simplest way to modify your program for variable-sized batches would be to define a placeholder for the batch size:
# Define a scalar tensor for the batch size, so that you can alter it at
# Session.run()-time.
batch_size_tensor = tf.placeholder(tf.int32, shape=[])
input_tensors = tf.train.batch(..., batch_size=batch_size_tensor, ...)
This would prevent the batch size from being baked into your GraphDef, so you should be able to feed values of any batch size in C++. However, this modification would require you to feed a value for the batch size on every step, which is slightly tedious.
Assuming that you always want to train with batch size 128, but retain the flexibility to change the batch size later, you could use a tf.placeholder_with_default() to specify that the batch size should be 128 when you don't feed an alternative value:
# Define a scalar tensor for the batch size, so that you can alter it at
# Session.run()-time.
batch_size_tensor = tf.placeholder_with_default(128, shape=[])
input_tensors = tf.train.batch(..., batch_size=batch_size_tensor, ...)
Is there a reason you need fixed batch size in the graph?
I think a good way is to build a graph with a variable batch size - by putting None as the first dimension. During training, you can then pass the batch size flag to your data provider, so it feeds the desired amount of data in each iteration.
After the model is trained, you can export the graph using tf.train.Saver(), which exports the metagraph. To do inference, you can load the exported files and just evaluate with any number of examples - also just one.
Note, this is different from the frozen graph.