variables_to_train flag in Tf-slim - tensorflow

I am fine-tuning my model from a pretrained model using TF-Slim. When I used the create_train_op, I found that it has a parameter that is variables_to_train. In some tutorial, it used the flag as follows:
all_trainable = [v for v in tf.trainable_variables()]
trainable = [v for v in all_trainable]
train_op = slim.learning.create_train_op(
opt,
global_step=global_step,
variables_to_train=trainable,
summarize_gradients=True)
But in the official TF-Slim, it does not use
all_trainable = [v for v in tf.trainable_variables()]
trainable = [v for v in all_trainable]
train_op = slim.learning.create_train_op(
opt,
global_step=global_step,
summarize_gradients=True)
So, what is different between with and without using variables_to_train?

Your two example both do the same thing. You train all trainable variables that occur in your graph. With the parameter variables_to_train you can define which variables should be updated during your training.
A use case for this is when you have pre-trained stuff like word embedding that you don't want to train in your model. With
train_vars = [v for v in tf.trainable_variables() if "embeddings" not in v.name]
train_op = slim.learning.create_train_op(
opt,
global_step=global_step,
variables_to_train=train_vars,
summarize_gradients=True)
you can exclude all variables from training that contain "embeddings" in their name. If you simply want to train all variables, you don't have to define train_vars and you can create the train op without the parameter variables_to_train.

Related

Restoring a single variable tensor saved in one model to variable tensor in another model - Tensorflow

running on tensorflow 1.3.0 GPU.
I have trained a model in TF and saved just a single varialbe tensor using:
embeddings = tf.Variable(tf.random_uniform([4**kmer_len, embedding_size], -0.04, 0.04), name='Embeddings')
more code, variables...
saver = tf.train.Saver({"Embeddings": embeddings}) # saving only embeddings variable
some more code, training model...
saver.save(ses, './embeddings/embedding_mat') # saving the variable
Now, I have a different model in a different file and I would like to resotre just the single saved embeddings variable to it. the problem is that this new model has some more variables.
Now, when I try to restore the variable by doing:
embeddings = tf.Variable(tf.random_uniform([4**kmer_len_emb, embedding_size], -0.04, 0.04), name='Embeddings')
dense1 = tf.layers.dense(inputs=kmer_flattened, units=200, activation=tf.nn.relu, use_bias=True)
ses = tf.Session()
init = tf.global_variables_initializer()
ses.run(init)
saver = tf.train.Saver()
saver.restore(ses, './embeddings/embedding_mat')
I'm getting a not found in checkpoint error.
Any thoughts on how to deal with this?
Thanks
You must create an Instance of Saver just on that variable:
saver = tf.train.Saver(var_list=[embeddings])
This is saying to your Saver instance to take care of restoring/saving only that particular variable of that graph, otherwise it will try to restore/save all the variables of the graph.
It is because it can't find the dense1 checkpoint. try this:
all_var = tf.global_variables()
var_to_restore = [v for v in all_var if v.name == 'Embeddings:0']
ses.run(init)
saver = tf.train.Saver(var_to_restore)
saver.restore(ses, './embeddings/embedding_mat')

When we train a neural network with ExponentialMovingAverage, does the model use the decayed variable by default?

When we use tf.train.ExponentialMovingAverage.apply(var) to maintains moving averages of variables, then if we update a variable such as use tf.assign, to get decayed variable, we will use tf.train.ExponentialMovingAverage.average(var), but if we get the variable directly by tf.Session.run(var), we will get the variable without decay.
For example:
import tensorflow as tf;
v1 = tf.Variable(0, dtype=tf.float32)
ema = tf.train.ExponentialMovingAverage(0.99)
maintain_average = ema.apply([v1])
with tf.Session() as sess:
init = tf.initialize_all_variables()
sess.run(init)
print(sess.run([v1, ema.average(v1)]))
# Out:[0.0, 0.0]
sess.run(tf.assign(v1, 5))
sess.run(maintain_average)
print(sess.run([v1, ema.average(v1)]))
# Out: [10.0, 0.14949986]
So when we train a neural network with ExponentialMovingAverage, does the model default to using the decayed variable by tf.train.ExponentialMovingAverage.average()?
More concrete example:
image_tensor = tf.placeholder(tf.float32,
[BATCH_SIZE,IMAGE_SIZE,IMAGE_SIZE,IMAGE_CHANNELS],
'image-tensor')
label_tensor = tf.placeholder(tf.int32,
[None,10],
'label-tensor')
net_output = creat_net(image_tensor)
#suppose creat_net() have build a neural network
global_step = tf.Variable(0, trainable=False)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=net_output, labels=label_tensor))
loss = cross_entropy
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
with tf.control_dependencies([train_step]):
training_op = ema.apply(tf.trainable_variables())
So when I run the training_op to train the network, the network will use the average at default or I need extra code to use decayed variables? In other words, GradientDescentOptimizer will use the true value or decayed value to compute loss in the next step?
v1 is a variable with its own value (10.0 in your case).
tf.train.ExponentialMovingAverage maintains a variable inside, that gets update each time you invoke average.
Every time you invoke average with a new input, you're just computing the next time step of the exponential moving average (hence just changing the private variable of tf.train.ExponentialMovingAverage op) without changing the input variables at all.

tensorflow RNN implementation

I'm building a RNN model to do the image classification. I used a pipeline to feed in the data. However it returns
ValueError: Variable rnn/rnn/basic_rnn_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
I wonder what can I do to fix this since there are not many examples of implementing RNN with an input pipeline. I know it would work if I use the placeholder, but my data is already in the form of tensors. Unless I can feed the placeholder with tensors, I prefer just to use the pipeline.
def RNN(inputs):
with tf.variable_scope('cells', reuse=True):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size)
with tf.variable_scope('rnn'):
outputs, states = tf.nn.dynamic_rnn(basic_cell, inputs, dtype=tf.float32)
fc_drop = tf.nn.dropout(states, keep_prob)
logits = tf.contrib.layers.fully_connected(fc_drop, batch_size, activation_fn=None)
return logits
#Training
with tf.name_scope("cost_function") as scope:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch)))
train_step = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cost)
#Accuracy
with tf.name_scope("accuracy") as scope:
correct_prediction = tf.equal(tf.argmax(RNN(test_image), 1), tf.argmax(test_image_label, 0))
accuracy = tf.cast(correct_prediction, tf.float32)
You need to use the reuse option correctly. following changes would solve it. For prediction you need to use the already existed variables in the graph.
def RNN(inputs, reuse):
with tf.variable_scope('cells', reuse=reuse):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size, reuse=reuse)
...
...
#Training
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch, reuse=None)))
#Accuracy
...
correct_prediction = tf.equal(tf.argmax(RNN(test_image, reuse=True), 1), tf.argmax(test_image_label, 0))

In Tensorflow, how to use a restored meta-graph if the meta graph was feeding with TFRecord input (without placeholders)

I trained a network with TFRecord input pipeline. In other words, there was no placeholders. Simple example would be:
input, truth = _get_next_batch() # TFRecord. `input` is not a tf.placeholder
net = Model(input)
net.set_loss(truth)
optimizer = tf...(net.loss)
Let's say, I acquired three files, ckpt-20000.meta, ckpt-20000.data-0000-of-0001, ckpt-20000.index. I understood that, later one can import the meta-graph using the .meta file and access tensors such as:
new_saver = tf.train.import_meta_graph('ckpt-20000.meta')
new_saver.restore(sess, 'ckpt-20000')
logits = tf.get_collection("logits")[0]
However, the meta-graph does not have a placeholder from the beginning in the pipeline. Is there a way that I can use meta-graph and query inference of an input?
For information, in a query application (or a script), I used to define a model with a placeholder and restored model weights (see below). I am wondering if I can just utilize the meta-graph without re-definition since it would be much more simple.
input = tf.placeholder(...)
net = Model(input)
tf.restore(sess, 'ckpt-2000')
lgt = sess.run(net.logits, feed_dict = {input:img})
You can build a graph that uses placeholder_with_default() for the inputs, so can use both TFRecord input pipeline as well as feed_dict{}.
An example:
input, truth = _get_next_batch()
_x = tf.placeholder_with_default(input, shape=[...], name='input')
_y = tf.placeholder_with_default(truth, shape-[...], name='label')
net = Model(_x)
net.set_loss(_y)
optimizer = tf...(net.loss)
Then during inference,
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
new_saver = tf.train.import_meta_graph('ckpt-20000.meta')
new_saver.restore(sess, 'ckpt-20000')
# Get the tensors by their variable name
input = loaded_graph.get_tensor_by_name('input:0')
logits = loaded_graph.get_tensor_by_name(...)
# Now you can feed the inputs to your tensors
lgt = sess.run(logits, feed_dict = {input:img})
In the above example, if you don't feed input, then the input will be read from the TFRecord input pipeline.
Is there a way to do it without placeholders at test though? It should be possible to re-use the graph with a new input pipeline without resorting to slow placeholders (i.e. the test dataset may be very large). placeholder_with_default is a suboptimal solution in that case.
The recommended way is saving two meta graphs. One is for Training/Validation/Testing, and the other one is for inference.
see Building a SavedModel
export_dir = ...
...
builder = tf.saved_model_builder.SavedModelBuilder(export_dir)
with tf.Session(graph=tf.Graph()) as sess:
...
builder.add_meta_graph_and_variables(sess,
[tag_constants.TRAINING],
signature_def_map=foo_signatures,
assets_collection=foo_assets)
...
# Add a second MetaGraphDef for inference.
with tf.Session(graph=tf.Graph()) as sess:
...
builder.add_meta_graph([tag_constants.SERVING])
...
builder.save()
The NMT tutorial also provides a detailed example about creating multiple graphs with shared variables: Neural Machine Translation (seq2seq) Tutorial-Building Training, Eval, and Inference Graphs
train_graph = tf.Graph()
eval_graph = tf.Graph()
infer_graph = tf.Graph()
with train_graph.as_default():
train_iterator = ...
train_model = BuildTrainModel(train_iterator)
initializer = tf.global_variables_initializer()
with eval_graph.as_default():
eval_iterator = ...
eval_model = BuildEvalModel(eval_iterator)
with infer_graph.as_default():
infer_iterator, infer_inputs = ...
infer_model = BuildInferenceModel(infer_iterator)
checkpoints_path = "/tmp/model/checkpoints"
train_sess = tf.Session(graph=train_graph)
eval_sess = tf.Session(graph=eval_graph)
infer_sess = tf.Session(graph=infer_graph)
train_sess.run(initializer)
train_sess.run(train_iterator.initializer)
for i in itertools.count():
train_model.train(train_sess)
if i % EVAL_STEPS == 0:
checkpoint_path = train_model.saver.save(train_sess, checkpoints_path, global_step=i)
eval_model.saver.restore(eval_sess, checkpoint_path)
eval_sess.run(eval_iterator.initializer)
while data_to_eval:
eval_model.eval(eval_sess)
if i % INFER_STEPS == 0:
checkpoint_path = train_model.saver.save(train_sess, checkpoints_path, global_step=i)
infer_model.saver.restore(infer_sess, checkpoint_path)
infer_sess.run(infer_iterator.initializer, feed_dict={infer_inputs: infer_input_data})
while data_to_infer:
infer_model.infer(infer_sess)

How to train different LSTM on the same tensorflow session?

I would like to train two different LSTMs to make them interact in a dialogue context (ie one rnn generate a sequence, which will be used as a context for the second rnn, which will answer, etc...). However, I do not know how to train them separately on tensorflow (I think that I did not fully understand the logic behind tf graphs). When I execute my code, I get the following error:
Variable rnn/basic_lstm_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?
The error happens when I create my second RNN. Do you know how to fix this ?
My code is the following:
#User LSTM
no_units=100
_seq_user = tf.placeholder(tf.float32, [batch_size, max_length_user, user_inputShapeLen], name='seq')
_seq_length_user = tf.placeholder(tf.int32, [batch_size], name='seq_length')
cell = tf.contrib.rnn.BasicLSTMCell(
no_units)
output_user, hidden_states_user = tf.nn.dynamic_rnn(
cell,
_seq_user,
dtype=tf.float32,
sequence_length=_seq_length_user
)
out2_user = tf.reshape(output_user, shape=[-1, no_units])
out2_user = tf.layers.dense(out2_user, user_outputShapeLen)
out_final_user = tf.reshape(out2_user, shape=[-1, max_length_user, user_outputShapeLen])
y_user_ = tf.placeholder(tf.float32, [None, max_length_user, user_outputShapeLen])
softmax_user = tf.nn.softmax(out_final_user, dim=-1)
loss_user = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_user, labels=y_user_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_user)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(epoch):
print 'Epoch: ', i
batch_X, batch_Y, batch_sizes = lstm.batching(user_train_X, user_train_Y, sizes_user_train)
for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
sess.run(minimize, {_seq_user:data_, _seq_length_user:size_, y_user_:target_})
#System LSTM
no_units_system=100
_seq_system = tf.placeholder(tf.float32, [batch_size, max_length_system, system_inputShapeLen], name='seq_')
_seq_length_system = tf.placeholder(tf.int32, [batch_size], name='seq_length_')
cell_system = tf.contrib.rnn.BasicLSTMCell(
no_units_system)
output_system, hidden_states_system = tf.nn.dynamic_rnn(
cell_system,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)
out2_system = tf.reshape(output_system, shape=[-1, no_units])
out2_system = tf.layers.dense(out2_system, system_outputShapeLen)
out_final_system = tf.reshape(out2_system, shape=[-1, max_length_system, system_outputShapeLen])
y_system_ = tf.placeholder(tf.float32, [None, max_length_system, system_outputShapeLen])
softmax_system = tf.nn.softmax(out_final_system, dim=-1)
loss_system = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_system, labels=y_system_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_system)
for i in range(epoch):
print 'Epoch: ', i
batch_X, batch_Y, batch_sizes = lstm.batching(system_train_X, system_train_Y, sizes_system_train)
for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
sess.run(minimize, {_seq_system:data_, _seq_length_system:size_, y_system_:target_})
Regarding the variable scope error, try setting different variable scope for each graph.
with tf.variable_scope('User_LSTM'):
your user_lstm graph
with tf.variable_scope('System_LSTM'):
your system_lstm graph
Also, avoid using same names for different python objects. (ex.optimizer) The second declaration will override the first declaration, which will confuse you when you use tensorboard.
By the way, I would recommend training the model end-to-end fashion rather than running two sessions separately. Try feeding the output tensor of the first LSTM into the second LSTM with single optimizer and loss function.
To be short, to solve the problem(Variable rnn/basic_lstm_cell/weights already exists), what you need are 2 separated variable scopes (as is mentioned by #J-min). Because in tensorflow, variables are organized by their names, and by manage these two sets of variables in the two scopes, tensorflow will be able to distinguish them from each other.
And by train them separately on tensorflow, I suppose that you want to define two distinct loss functions, and optimize these two LSTM networks with two optimizers, each corresponding to one of the loss functions before.
Under such circumstances, you need to get the lists of these two sets of variables, and pass these lists to your optimizer, like that
opt1 = GradientDescentOptimizer(learning_rate=0.1)
opt_op1 = opt.minimize(loss1, var_list=<list of variables from scope 1>)
opt2 = GradientDescentOptimizer(learning_rate=0.1)
opt_op2 = opt.minimize(loss2, var_list=<list of variables from scope 2>)
Just change argument name of the cells while initializing them. For example:
user_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='user')
system_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='system')
In this way, TensorFlow won't share the variables of two cells. Then you can get the outputs as:
output_user, hidden_states_user = tf.nn.dynamic_rnn(
user_cell,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)
output_system, hidden_states_system = tf.nn.dynamic_rnn(
system_cell,
_seq_system,
dtype=tf.float32,
sequence_length=_seq_length_system
)