I'm referring to a Note at tf.layers.batch_normilization:
Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
How would one implement this in a Custom Estimator? For example looking at this example on Tensorflow's website: The complete abalone model_fn
On the following issue, at the very bottom you have an example
https://github.com/tensorflow/tensorflow/issues/16455
if mode == tf.estimator.ModeKeys.TRAIN:
lr = 0.001
optimizer = tf.train.RMSPropOptimizer(learning_rate=lr, decay=0.9)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode,
loss=loss,
train_op=train_op)
I guess you can pass the train_op you refer to the train_op parameter of the EstimatorSpec.
Related
I made my model following by https://www.tensorflow.org/tutorials/estimators/cnn.
I added SummarySaverHook to my model
summary_hook = tf.train.SummarySaverHook(
100,
output_dir='C:/Users/dir',
summary_op=tf.summary.merge_all())
# Configure the Training Op (for TRAIN mode)
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(
loss=loss,
global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op, training_hooks=[summary_hook])
But when i run a get only enqueue_input chart(I don't known what is it) and model graph. I want get accuracy and loss charts.
So i want a couple of details in my tensorboard.
Loss and accuraty chars
It possible to get accuracy chart in time, because in estimator I only get accuracy after final step.
Can i get more details in tensorboard, like wrong predicted images? But without Session and Graph creation, only from estimator api?
First of all, you don't need to use summary_hook. You just need to specify desired metrics with tf.metrics right after you specify logits.
logits = tf.layers.dense(inputs=dropout, units=10)
predictions = {
"classes": tf.argmax(input=logits, axis=1),
"probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}
accuracy = tf.metrics.accuracy(labels=labels, predictions=predictions['classes']
tf.summary.scalar('acc', accuracy[1])
And put this
tf.logging.set_verbosity(tf.logging.INFO)
right after your inputs, if you haven't done so.
You can plot evaluation metrics by inserting eval_metric_ops = {'accuracy': accuracy} dict to tf.estimator.EstimatorSpec
You can use tf.summary for visualizing images, weights and biases, etc.
I am trying to use tf.train.exponential_decay with predefined estimators and this is proving to be super difficult for some reason. Am I missing something here?
Here is my old code with constant learning_rate:
classifier = tf.estimator.DNNRegressor(
feature_columns=f_columns,
model_dir='./TF',
hidden_units=[2, 2],
optimizer=tf.train.ProximalAdagradOptimizer(
learning_rate=0.50,
l1_regularization_strength=0.001,
))
Now I tried adding this:
starter_learning_rate = 0.50
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
10000, 0.96, staircase=True)
but now what?
estimator.predict() does not accept global_step so it will be stuck at 0?
Even if I pass learning_rate to tf.train.ProximalAdagradOptimizer() I get an error saying
"ValueError: Tensor("ExponentialDecay:0", shape=(), dtype=float32)
must be from the same graph as
Tensor("dnn/hiddenlayer_0/kernel/part_0:0", shape=(62, 2),
dtype=float32_ref)."
Your help is greatly appreciated. I am using TF1.6 btw.
you should let optimizer under mode == tf.estimator.ModeKeys.TRAIN
here is sample code
def _model_fn(features, labels, mode, config):
# xxxxxxxxx
# xxxxxxxxx
assert mode == tf.estimator.ModeKeys.TRAIN
global_step = tf.train.get_global_step()
decay_learning_rate = tf.train.exponential_decay(learning_rate, global_step, 100, 0.98, staircase=True)
optimizer = adagrad.AdagradOptimizer(decay_learning_rate)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op, training_chief_hooks=chief_hooks, eval_metric_ops=metrics)
I can get traing loss every global step. But I do want to add the evaluate loss in graph 'lossxx' in tensorboard. How to do that?
class MyHook(tf.train.SessionRunHook):
def after_run(self,run_context,run_value):
_session = run_context.session
_session.run(_session.graph.get_operation_by_name('acc_op'))
def my_model(features, labels, mode):
...
logits = tf.layers.dense(net, 3, activation=None)
predicted_classes = tf.argmax(logits, 1)
if mode == tf.estimator.ModeKeys.PREDICT:
predictions = {
'class': predicted_classes,
'prob': tf.nn.softmax(logits)
}
return tf.estimator.EstimatorSpec(mode, predictions=predictions)
# Compute loss.
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
acc, acc_op = tf.metrics.accuracy(labels=labels, predictions=predicted_classes)
tf.identity(acc_op,'acc_op')
loss_sum = tf.summary.scalar('lossxx',loss)
accuracy_sum = tf.summary.scalar('accuracyxx',acc)
merg = tf.summary.merge_all()
# Create training op.
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op,
training_chief_hooks=[
tf.train.SummarySaverHook(save_steps=10, output_dir='./model', summary_op=merg)])
return tf.estimator.EstimatorSpec(
mode, loss=loss, eval_metric_ops={'accuracy': (acc, acc_op)}
)
classifier.train(input_fn=train_input_fn, steps=1000,hooks=[ MyHook()])
You actually don't need to create a SummarySaverHook by yourself, as it is already included in the tf.estimator.Estimator. Just create all the summaries you want with tf.summary.xxx and they will all be evaluated every n steps. (See tf.estimator.RunConfig for this).
Also, you don't need to create a summary for your final loss loss. This will also be created for you automatically. If you do it like this, then the training and evaluation summaries will be shown in the same graph on TensorBoard. The estimator creates a sub-directory eval in your current model_dir to achieve this.
And a small hint: use the acc_op directly in summaries to update the metric and get the value of it. However, the tf.metrics functions are quite difficult to handle ;-)
You need to pass evaluation data to the model alongside with training data by using tf.estimator.train_and_evaluate
The more accurate description for this issue is that MobileNet behaves bad when is_training is not set to true explicitly.
And I'm referring to the MobileNet that is provided by TensorFlow in their model repository https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.py.
This is how I create the net (phase_train=True):
with slim.arg_scope(mobilenet_v1.mobilenet_v1_arg_scope(is_training=phase_train)):
features, endpoints = mobilenet_v1.mobilenet_v1(
inputs=images_placeholder, features_layer_size=features_layer_size, dropout_keep_prob=dropout_keep_prob,
is_training=phase_train)
I'm training a recognition network and while training I test on LFW. The results that I get during the training are improving over time and getting a good accuracy.
Before deployment I freeze the graph. if I freeze the graph with is_training=True the results that I get on LFW are the same as during training.
But if I set is_training=False I get results like the network haven't trained at all...
This behavior actually happens with other networks like Inception.
I tend to believe that I miss something very fundamental here and that this is not a bug in TensorFlow...
Any help would be appreciated.
Adding more code...
This is how I prepare for training:
images_placeholder = tf.placeholder(tf.float32, shape=(None, image_size, image_size, 1), name='input')
labels_placeholder = tf.placeholder(tf.int32, shape=(None))
dropout_placeholder = tf.placeholder_with_default(1.0, shape=(), name='dropout_keep_prob')
phase_train_placeholder = tf.Variable(True, name='phase_train')
global_step = tf.Variable(0, name='global_step', trainable=False)
# build graph
with slim.arg_scope(mobilenet_v1.mobilenet_v1_arg_scope(is_training=phase_train_placeholder)):
features, endpoints = mobilenet_v1.mobilenet_v1(
inputs=images_placeholder, features_layer_size=512, dropout_keep_prob=1.0,
is_training=phase_train_placeholder)
# loss
logits = slim.fully_connected(inputs=features, num_outputs=train_data.get_class_count(), activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(stddev=0.1),
weights_regularizer=slim.l2_regularizer(scale=0.00005),
scope='Logits', reuse=False)
tf.losses.sparse_softmax_cross_entropy(labels=labels_placeholder, logits=logits,
reduction=tf.losses.Reduction.MEAN)
loss = tf.losses.get_total_loss()
# normalize output for inference
embeddings = tf.nn.l2_normalize(features, 1, 1e-10, name='embeddings')
# optimizer
optimizer = tf.train.AdamOptimizer()
train_op = optimizer.minimize(loss, global_step=global_step)
This is my train step:
batch_data, batch_labels = train_data.next_batch()
feed_dict = {
images_placeholder: batch_data,
labels_placeholder: batch_labels,
dropout_placeholder: dropout_keep_prob
}
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
I could add the code for how I freeze the graph but it's not really necessary. it's enough to build the graph with is_train=false, load latest checkpoint and run the evaluation on LWF to reproduce the problem.
Update...
I found that the problem is in the batch normalization layer. it's enough to set this layer to is_training=false to reproduce the problem.
references that I found after finding this:
http://ruishu.io/2016/12/27/batchnorm/
https://github.com/tensorflow/tensorflow/issues/10118
Batch Normalization - Tensorflow
Will update with a solution once I have a tested one.
So I found a solution.
Mainly using this reference: http://ruishu.io/2016/12/27/batchnorm/
From the link:
Note: When is_training is True the moving_mean and moving_variance need to be updated, by default the update_ops are placed in tf.GraphKeys.UPDATE_OPS so they need to be added as a dependency to the train_op, example:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) if update_ops: updates = tf.group(*update_ops) total_loss = control_flow_ops.with_dependencies([updates], total_loss)
And to be straight to the point,
instead of creating the optimizer like so:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(total_loss, global_step=global_step)
Do it like this:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(total_loss, global_step=global_step)
That will solve the issue.
is_training should not have this effect. I need to see more of your code to understand what is happening, but odds are the variable names are not matching when you set is_training to false probably because of a variable scope reuse issue.
I am confused about the difference between apply_gradients and minimize of optimizer in tensorflow. For example,
optimizer = tf.train.AdamOptimizer(1e-3)
grads_and_vars = optimizer.compute_gradients(cnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
and
optimizer = tf.train.AdamOptimizer(1e-3)
train_op = optimizer.minimize(cnn.loss, global_step=global_step)
Are they the same indeed?
If I want to decay the learning rate, can I use the following codes?
global_step = tf.Variable(0, name="global_step", trainable=False)
starter_learning_rate = 1e-3
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100, FLAGS.decay_rate, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
)
Thanks for your help!
You can easily know from the link : https://www.tensorflow.org/get_started/get_started
(tf.train API part) that they actually do the same job.
The difference it that: if you use the separated functions( tf.gradients, tf.apply_gradients), you can apply other mechanism between them, such as gradient clipping.
here it says minimize uses tf.GradienTape and then apply_gradients:
Minimize loss by updating var_list.
This method simply computes gradient using tf.GradientTape and calls
apply_gradients(). If you want to process the gradient before applying
then call tf.GradientTape and apply_gradients() explicitly instead of
using this function.
So minimize actually uses apply_gradients just like:
def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None):
grads_and_vars = self._compute_gradients(loss, var_list=var_list, grad_loss=grad_loss, tape=tape)
return self.apply_gradients(grads_and_vars, name=name)
In your example, you use compute_gradients and apply_gradients, this is indeed valid but nowadays, compute_gradients was made private and is therefore not good practice to use it. For this reason the function is not longer on the documentation.