tensorflow is telling me that the momentum part of a variable is uninitiated when I use the momentum optimizer. When I use the gradientDescent optimizer things work fine.
Here is a relevant part of the stack trace:
tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value fc3/biases/Momentum
[[Node: Momentum/update_fc3/biases/ApplyMomentum = ApplyMomentum[T=DT_FLOAT, _class=["loc:#fc3/biases"], use_locking=false, _device="/job:localhost/replica:0/task:0/cpu:0"](fc3/biases, fc3/biases/Momentum, Momentum/learning_rate, gradients/fc3/logits_grad/tuple/control_dependency_1, Momentum/momentum)]]
Caused by op u'Momentum/update_fc3/biases/ApplyMomentum', defined at:
train_op = vgg.optimizer.minimize(vgg.loss, global_step=vgg.global_step)
I think the code is correct, it defines the ops for all layers before the initialize all variables op, etc. If not the GradientDescent optimizer wouldn't work right?
Following up on #etarion comment, here is a sketch of the code, starts with
def train(args):
datareader = # object to read data - no tensorflow code/import
with tf.Graph().as_default():
with_graph(datareader, args)
then with_graph does
def with_graph(datareader, args):
num_outputs = datareader.num_outputs()
img_orig = tf.placeholder(tf.float32, shape=datareader.features_placeholder_shape())
img_vgg16 = preprocess.imgbatch_2_vgg16(imgs=img_orig, channel_mean=8.46)
labels_placeholder = tf.placeholder(tf.float32, shape=(None, num_outputs))
vgg = vgg16(imgs=img_vgg16, weights=None, sess=None, trainable=args.trainable, stop_at_fc2=args.fc2)
add_loss(vgg, labels_placeholder, num_outputs, args)
add_optimizer(vgg, args)
sess = tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads = 12))
init = tf.initialize_all_variables()
validation_imgs_orig, validation_labels = datareader.get_validation_set()
validation_imgs_vgg16 = sess.run(img_vgg16, {img_orig: validation_imgs_orig})
validation_feed_dict = {img_vgg16:validation_imgs_vgg16,
train_op = vgg.optimizer.minimize(vgg.loss, global_step=vgg.global_step)
print("Starting training.")
for step_number in range(3):
t0 = time.time()
train_imgs, train_labels = datareader.get_next_minibatch()
train_feed_dict = {img_orig: train_imgs,
sess.run(train_op, feed_dict=train_feed_dict)
print("step %3d took %.2f sec." % (step_number, time.time()-t0))
The gradient descent optimizer does not have internal variables, the momentum one has. Somehow you don't initialize the state of the momentum (can't tell why exactly without the code). Ways to do that are initializing all variables right before you run the graph (after you added the optimizer to the graph), or, if you want to be explicit in what you initialize, use the get_slot_names()/get_slot() methods of the optimizer to get the Variables that make up the optimizer's internal state.
The problem is I was defining the training op from the optimizer minimize function after initializing all the variables - once I moved that in front of initialize all variables it worked.
So I am trying to write a simple softmax classifier in TensorFlow.
Here is the code:
# Neural network parameters
n_hidden_units = 500
n_classes = 10
# training set placeholders
input_X = tf.placeholder(dtype='float32',shape=(None,X_train.shape[1], X_train.shape[2]),name="input_X")
input_y = tf.placeholder(dtype='int32', shape=(None,), name="input_y")
# hidden layer
dim = X_train.shape[1]*X_train.shape[2] # dimension of each traning data point
flatten_X = tf.reshape(input_X, shape=(-1, dim))
weights_hidden_layer = tf.Variable(initial_value=np.zeros((dim,n_hidden_units)), dtype ='float32')
bias_hidden_layer = tf.Variable(initial_value=np.zeros((1,n_hidden_units)), dtype ='float32')
hidden_layer_output = tf.nn.relu(tf.matmul(flatten_X, weights_hidden_layer) + bias_hidden_layer)
# output layer
weights_output_layer = tf.Variable(initial_value=np.zeros((n_hidden_units,n_classes)), dtype ='float32')
bias_output_layer = tf.Variable(initial_value=np.zeros((1,n_classes)), dtype ='float32')
output_logits = tf.matmul(hidden_layer_output, weights_output_layer) + bias_output_layer
predicted_y = tf.nn.softmax(output_logits)
# loss
one_hot_labels = tf.one_hot(input_y, depth=n_classes, axis = -1)
loss = tf.losses.softmax_cross_entropy(one_hot_labels, output_logits)
# optimizer
optimizer = tf.train.MomentumOptimizer(0.01, 0.5).minimize(
loss, var_list=[weights_hidden_layer, bias_hidden_layer, weights_output_layer, bias_output_layer])
This compiles, and I have checked the shape of all the tensor and it coincides with what I expect.
However, I tried to run the optimizer using the following code:
# running the optimizer
s = tf.InteractiveSession()
for i in range(5):
s.run(optimizer, {input_X: X_train, input_y: y_train})
loss_i = s.run(loss, {input_X: X_train, input_y: y_train})
print("loss at iter %i:%.4f" % (i, loss_i))
And the loss kept being the same in all iterations!
I must have messed up something, but I fail to see what.
Any ideas? I also appreciate if somebody leaves comments regarding code style and/or tensorflow tips.
You have made a mistake. You are initializing your weights using np.zeros. Use np.random.normal. You can choose mean for this Gaussian Distribution by using number of inputs going to a particular neuron. You can read more about it here.
The reason that you want to initialize with Gaussian Distribution is because you want to break symmetry. If all the weights are initialized by zero, then you can use backpropogation to see that all the weights will evolved same.
One could visualize the weight histogram using TensorBoard to make it easier. I executed your code for this. A few more lines are needed to set up Tensorboard logging but the histogram summary of weights can be easily added.
Initialized to zeros
weights_hidden_layer = tf.Variable(initial_value=np.zeros((784,n_hidden_units)), dtype ='float32')
Xavier initialization
initializer = tf.contrib.layers.xavier_initializer()
weights_hidden_layer = tf.Variable(initializer(shape=(784,n_hidden_units)), dtype ='float32')
When we use tf.train.ExponentialMovingAverage.apply(var) to maintains moving averages of variables, then if we update a variable such as use tf.assign, to get decayed variable, we will use tf.train.ExponentialMovingAverage.average(var), but if we get the variable directly by tf.Session.run(var), we will get the variable without decay.
For example:
import tensorflow as tf;
v1 = tf.Variable(0, dtype=tf.float32)
ema = tf.train.ExponentialMovingAverage(0.99)
maintain_average = ema.apply([v1])
with tf.Session() as sess:
init = tf.initialize_all_variables()
print(sess.run([v1, ema.average(v1)]))
# Out:[0.0, 0.0]
sess.run(tf.assign(v1, 5))
print(sess.run([v1, ema.average(v1)]))
# Out: [10.0, 0.14949986]
So when we train a neural network with ExponentialMovingAverage, does the model default to using the decayed variable by tf.train.ExponentialMovingAverage.average()?
More concrete example:
image_tensor = tf.placeholder(tf.float32,
label_tensor = tf.placeholder(tf.int32,
net_output = creat_net(image_tensor)
#suppose creat_net() have build a neural network
global_step = tf.Variable(0, trainable=False)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=net_output, labels=label_tensor))
loss = cross_entropy
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
with tf.control_dependencies([train_step]):
training_op = ema.apply(tf.trainable_variables())
So when I run the training_op to train the network, the network will use the average at default or I need extra code to use decayed variables? In other words, GradientDescentOptimizer will use the true value or decayed value to compute loss in the next step?
v1 is a variable with its own value (10.0 in your case).
tf.train.ExponentialMovingAverage maintains a variable inside, that gets update each time you invoke average.
Every time you invoke average with a new input, you're just computing the next time step of the exponential moving average (hence just changing the private variable of tf.train.ExponentialMovingAverage op) without changing the input variables at all.
Tensorflow programmer's guide recommends using feedable iterator to switch between training and validation dataset without reinitializing the iterator. It mainly requires to feed the handle to choose between them.
How to use it along with tf.train.MonitoredTrainingSession?
The following method fails with "RuntimeError: Graph is finalized and cannot be modified." error.
with tf.train.MonitoredTrainingSession() as sess:
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess.run(validation_iterator.string_handle())
How to achieve both the convenience of MonitoredTrainingSession and iterating training and validation datasets simultaneously?
I got the answer from the Tensorflow GitHub issue - https://github.com/tensorflow/tensorflow/issues/12859
The solution is to invoke the iterator.string_handle() before creating the MonitoredSession.
import tensorflow as tf
from tensorflow.contrib.data import Dataset, Iterator
dataset_train = Dataset.range(10)
dataset_val = Dataset.range(90, 100)
iter_train_handle = dataset_train.make_one_shot_iterator().string_handle()
iter_val_handle = dataset_val.make_one_shot_iterator().string_handle()
handle = tf.placeholder(tf.string, shape=[])
iterator = Iterator.from_string_handle(
handle, dataset_train.output_types, dataset_train.output_shapes)
next_batch = iterator.get_next()
with tf.train.MonitoredTrainingSession() as sess:
handle_train, handle_val = sess.run([iter_train_handle, iter_val_handle])
for step in range(10):
print('train', sess.run(next_batch, feed_dict={handle: handle_train}))
if step % 3 == 0:
print('val', sess.run(next_batch, feed_dict={handle: handle_val}))
('train', 0)
('val', 90)
('train', 1)
('train', 2)
('val', 91)
('train', 3)
#Michael Jaison G answer is correct. However, it does not work when you also want to use certain session_run_hooks that need to evaluate parts of the graph, like e.g. LoggingTensorHook or SummarySaverHook.
The example below will cause an error:
import tensorflow as tf
dataset_train = tf.data.Dataset.range(10)
dataset_val = tf.data.Dataset.range(90, 100)
iter_train_handle = dataset_train.make_one_shot_iterator().string_handle()
iter_val_handle = dataset_val.make_one_shot_iterator().string_handle()
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(
handle, dataset_train.output_types, dataset_train.output_shapes)
feature = iterator.get_next()
pred = feature * feature
tf.summary.scalar('pred', pred)
global_step = tf.train.create_global_step()
summary_hook = tf.train.SummarySaverHook(save_steps=5,
output_dir="summaries", summary_op=tf.summary.merge_all())
with tf.train.MonitoredTrainingSession(hooks=[summary_hook]) as sess:
handle_train, handle_val = sess.run([iter_train_handle, iter_val_handle])
for step in range(10):
feat = sess.run(feature, feed_dict={handle: handle_train})
pred_ = sess.run(pred, feed_dict={handle: handle_train})
print('train: ', feat)
print('pred: ', pred_)
if step % 3 == 0:
print('val', sess.run(feature, feed_dict={handle: handle_val}))
This will fail with error:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype string
[[Node: Placeholder = Placeholder[dtype=DT_STRING, shape=[], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
[[Node: cond/Switch_1/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_18_cond/Switch_1", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
The reason being that the hook will try to evaluate the graph already upon the first session.run([iter_train_handle, iter_val_handle]) which obviously does not contain a handle in the feed_dict yet.
The workaround solution being to overwrite the hooks that cause the problem and changing the code in before_run and after_run to only evaluate on session.run calls containing the handle in the feed_dict (you can access the feed_dict of the current session.run call via the run_context argument of before_run and after_run)
Or you can use the latest master of Tensorflow (post-1.4) which adds a run_step_fn function to MonitoredSession which allows you to specify the following step_fn which will avoid the error (on the expense of evaluating the if statement TrainingIteration number of times ...)
def step_fn(step_context):
if handle_train is None:
handle_train, handle_val = sess.run([iter_train_handle, iter_val_handle])
return step_context.run_with_hooks(fetches=..., feed_dict=...)
There is a demo for using placeholder in mot_session with SessionRunHook.
This demo is about switching datasets by feeding diff handle_string.
BTW, I have tried all solutions, but only this works.
I have used
tf.add_to_collection('Input', X)
tf.add_to_collection('TrueLabel', Y)
tf.add_to_collection('loss', loss)
tf.add_to_collection('accuracy', accuracy)
saver0 = tf.train.Saver()
saver0.save(sess, './save/model')
to save my code in one session scope. Then, I restore it from another session scope. CUrrent, I only has the training data, and I have save the placeholder X, and Y. WHile I cannot use them at this time:
train_data, train_label = get_data()
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph('./save/model.meta')
new_saver.restore(sess, './save/model')
graph = sess.graph
X = graph.get_collection('Input')
Y = graph.get_collection('TrueLabel')
loss = graph.get_collection('loss')
accuracy = graph.get_collection('accuracy')
for _ in range(5):
loss_str, accuracy_str = sess.run([loss, accuracy], {X:train_data, Y:train_label})
print('loss:{}, accuracy:{}'.format(loss_str, accuracy_str))
How can I do that? I found the tutorial docs did not give a complete example
This concern has been solved by myself. Once we load the graph and the variables. Just to obtain the placeholder like graph.get_tensor_by_name('Input:0'). Use the same way to obtain the loss and accuracy and so on what you want to collect.
A full example could be found from https://github.com/sunkevin1214/TF_implementation/blob/master/test_funs/test_save_load.py
I am confused about the difference between apply_gradients and minimize of optimizer in tensorflow. For example,
optimizer = tf.train.AdamOptimizer(1e-3)
grads_and_vars = optimizer.compute_gradients(cnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
optimizer = tf.train.AdamOptimizer(1e-3)
train_op = optimizer.minimize(cnn.loss, global_step=global_step)
Are they the same indeed?
If I want to decay the learning rate, can I use the following codes?
global_step = tf.Variable(0, name="global_step", trainable=False)
starter_learning_rate = 1e-3
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100, FLAGS.decay_rate, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
optimizer = tf.train.AdamOptimizer(learning_rate)
grads_and_vars = optimizer.compute_gradients(cnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
Thanks for your help!
You can easily know from the link : https://www.tensorflow.org/get_started/get_started
(tf.train API part) that they actually do the same job.
The difference it that: if you use the separated functions( tf.gradients, tf.apply_gradients), you can apply other mechanism between them, such as gradient clipping.
here it says minimize uses tf.GradienTape and then apply_gradients:
Minimize loss by updating var_list.
This method simply computes gradient using tf.GradientTape and calls
apply_gradients(). If you want to process the gradient before applying
then call tf.GradientTape and apply_gradients() explicitly instead of
using this function.
So minimize actually uses apply_gradients just like:
def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None):
grads_and_vars = self._compute_gradients(loss, var_list=var_list, grad_loss=grad_loss, tape=tape)
return self.apply_gradients(grads_and_vars, name=name)
In your example, you use compute_gradients and apply_gradients, this is indeed valid but nowadays, compute_gradients was made private and is therefore not good practice to use it. For this reason the function is not longer on the documentation.