Wrong output for restored variable in tensorflow graph - tensorflow

I'm currently toying around with saving and restoring of variables. For this purpose, I created two scripts. One of them saves a simple graph while the other restores it. Here the test script for saving the graph:
import tensorflow as tf
a = tf.Variable(3.0, name='a')
b = tf.Variable(5.0, name='b')
b = tf.assign_add(b, a)
n_steps = 5
global_step = tf.Variable(0, name='global_step', trainable=False)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for step in range(n_steps):
print(sess.run(b))
global_step.assign_add(1).eval()
print(global_step.eval())
saver.save(sess, './my_test_model', global_step=global_step)
Basically, I want to run through a loop 5 times and everytime I do this, I add a to b. I also want to keep track of the number of steps via global_step. This works as intended. The output is:
8.0 # value of b
1 # step
11.0
2
14.0
3
17.0
4
20.0
5
Now when restoring the variables, I try to get all three of them. Script is:
import tensorflow as tf
from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
# List ALL tensors.
print_tensors_in_checkpoint_file(tf.train.latest_checkpoint('./'), all_tensors=True, tensor_name='')
tf.reset_default_graph()
a = tf.get_variable('a', shape=[])
b = tf.get_variable('b', shape=[])
global_step = tf.get_variable('global_step', shape=[])
saver = tf.train.Saver()
with tf.Session() as sess:
ckpt = tf.train.latest_checkpoint('./')
if ckpt:
print(ckpt)
saver.restore(sess, ckpt)
else:
print('Nothing restored')
print(a.eval())
print(b.eval())
print(global_step.eval())
The output of this is
tensor_name: a
3.0
tensor_name: b
20.0
tensor_name: global_step
5
./my_test_model-5
INFO:tensorflow:Restoring parameters from ./my_test_model-5
3.0
20.0
7e-45
How is it possible that the value for global_step is stored correctly in the checkpoint, but upon evaluation I get this small 7e-45? Also, upon restoring, I seem to be unable to define any additional variables as it states it cannot find the variable in the checkpoint. How can I, for example, define a variable and add it to the b of the restored graph?
Thank you for your help!

This doesn't appear to be well documented by the TF docs, but you should specify the dtype for the global_step variable.
Incorrect
global_step = tf.get_variable('global_step', shape=[], dtype=tf.float32)
results in global_step=7e-5. The type is assumed to be dtf.float32 by default.
Correct
global_step = tf.get_variable('global_step', shape=[], dtype=tf.int32)
results in global_step=5

Related

TensorFlow get updates computed by optimizer

In tf, the optimizer class only has two function:
compute_gradients
apply_gradients
where apply_gradients returns an op that performs the update w <- w + Δw via a tf.assign_add function.
However I need direct access to the Δw itself. (or w' = w+Δw). I know that the optimizer adds nodes to the computational graph which compute this Δw for each variable. How can I access them? Or do I have to re-implement the optimizer myself?
The reason is that I need to compute gradients dw'/dw, as I am working on something related to gradient based hyperparameter optimization (cf. https://arxiv.org/abs/1703.01785)
The "delta" applied to each variable is not accessible through any common method or name. In fact, looking a bit into the source it seems rather difficult extract, as it varies from one optimizer to the other.
What you can do, at least, is to compute the differences between variable values and their updates. For example it could work like this:
import tensorflow as tf
with tf.Graph().as_default():
# Setup example model
x = tf.placeholder(tf.float32, [None, 1])
y = tf.placeholder(tf.float32, [None, 2])
w = tf.Variable([[1., 2.]], tf.float32)
pred = x # w
loss = (tf.reduce_sum(tf.squared_difference(pred, y))
/ tf.cast(tf.shape(x)[0], tf.float32))
# Record variable values before training step
# (tf.identity should work here but it does not so we use
# a trivial add operation to enforce the control dependency)
w_old = w + 0
# Train after having recorded variable values
with tf.control_dependencies([w_old]):
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
# Compute deltas after training
with tf.control_dependencies([train_op]):
w_delta = w.read_value() - w_old
init_op = tf.global_variables_initializer()
# Test
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(w))
# [[1. 2.]]
_, w_delta_val = sess.run(
[train_op, w_delta],
feed_dict={x: [[1.], [2.]], y: [[3., 4], [5., 6.]]})
print(w_delta_val)
# [[0.79999995 0.5999999 ]]
print(sess.run(w))
# [[1.8 2.6]]
To get the updated w', you get just print the w directly after you have executed optimizer.apply_gradients(). The w at present is the w'.
Now, if you want to acquire the gradient of w, just do the operation of w'-w.

Error upon restoring variable

I stumbled across an error that I am unable to resolve. What I am trying to do is the following thing:
I want to train a (dummy) model that adds a to b on every iteration. When finished, I want to save the variables as checkpoint. The first time I run it, it shall build the model from scratch. Every time I re-run the model, it should start from the last checkpoint and do the additions again. Hereby, I load the complete graph from the .meta file. The global step variable is there to keep track of the total number of steps I have trained.
import tensorflow as tf
from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
# List ALL tensors.
print_tensors_in_checkpoint_file(tf.train.latest_checkpoint('./'), all_tensors=True, tensor_name='')
tf.reset_default_graph()
global_step = tf.get_variable('global_step', shape=[], dtype=tf.int32, initializer=tf.constant_initializer(0), trainable=False)
def model(a, b):
b = tf.assign_add(b, a)
return b
with tf.Session() as sess:
ckpt = tf.train.latest_checkpoint('./')
if ckpt:
saver = tf.train.import_meta_graph('./my_test_model-1.meta')
saver.restore(sess, ckpt)
else:
a = tf.Variable(3.0, name='a')
b = tf.Variable(5.0, name='b')
b = model(a, b)
### before EDIT
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())
###
### after EDIT
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
###
for step in range(5):
global_step.assign_add(1).eval()
print(global_step.eval())
print(b.eval())
saver.save(sess, './my_test_model', global_step=global_step)
The script runs fine for the first time, outputting this:
1 # step
8.0 # value of b
2
11.0
3
14.0
4
17.0
5
20.0
The second time I run the program, I get this output followed by an error:
tensor_name: a
3.0
tensor_name: b
20.0
tensor_name: global_step
0
tensor_name: global_step_1
5
INFO:tensorflow:Restoring parameters from ./my_test_model-5
Traceback (most recent call last): ... FailedPreconditionError:
Attempting to use uninitialized value global_step [[Node:
AssignAdd_2 = AssignAdd[T=DT_INT32, use_locking=false,
_device="/job:localhost/replica:0/task:0/device:CPU:0"](global_step, AssignAdd_2/value)]] ...
The first time, it's clear that it won't throw an error as I run the initializer for all variables. But I thought that restoring a model counts as some sort of initialization? I really cannot wrap my head around this concept. I also tried defining global_step after defining a and b, but this resulted in another error when loading for the first time:
ValueError: Cannot use the default session to evaluate tensor: the
tensor's graph is different from the session's graph. Pass an explicit
session to eval(session=sess).
The error refers to the the line that increments global_step (global_step.assign_add(1).eval()).
What am I doing wrong? Where should I define the variable?
I appreciate any help on this problem! Thank you for reading this far.
EDIT:
Thanks to #Diana, the precondition error vanished. Unfortunately, another error occured. Whenever running the script with loading a checkpoint, it throws a name error:
NameError: name 'global_step' is not defined.
This also happens for variable ´b´. Shouldn't be the name loaded when restoring the checkpoint? The tensors seem to have the right names and values when I check the tensors in the checkpoint file.
You should declare the saver after you ran initialize. Otherwise you don't save any value to it. As the saver does not know it.

tensorflow - assign name to optimizer for future restoration

I create model in tensorflow and one of last lines in it is
import tensorflow as tf
...
train_step = tf.train.AdagradOptimizer(LEARNING_RATE).minimize(some_loss_function)
I wonder if I can give this tensor/operation a name, so that that I can restore it by name after saving to disk?
Alternatively, if I cannot give it a name, how can I find it in output of
the following command:
tf.get_default_graph().get_operations()
According to the docs for tf.train.Optimizer yes, yes you can.
train_step = tf.train.AdamOptimizer().minimize(loss, name='my_training_step')
You can then restore the op later with:
saver = tf.train.Saver(...)
sess = tf.Session()
saver.restore(sess, 'path/to/model')
train_op = sess.graph.get_operation_by_name('my_training_step')
You can also store the training operation in a collection and restore it by importing the meta graph. Adding to a collection and saving looks like:
saver = tf.train.Saver(...)
tf.add_to_collection('train_step', train_step)
# ...
with tf.Session() as sess:
# ...
sess.save(sess, ...)
And restoring looks like:
new_saver = tf.train.import_meta_graph('path/to/metagraph')
new_saver.restore(sess, 'path/to/model')
train_op = tf.get_collection('train_step')[0] # restore the op

Tensorflow save and restore variables are not the same

It's from Udacity deep learning foundation course. It seems to work for them. But it doesn't work in my computer. Please have a look. Appreciate your helps!
The tensorflow versions from the lecture and my computer are both 1.0.0.
import tensorflow as tf
# The file path to save the data
save_file = './model.ckpt'
# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))
# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()
with tf.Session() as sess:
# Initialize all the Variables
sess.run(tf.global_variables_initializer())
# Show the values of weights and bias
print('Weights:')
print(sess.run(weights))
print('Bias:')
print(sess.run(bias))
# Save the model
saver.save(sess, save_file)
# Remove the previous weights and bias
tf.reset_default_graph()
# Two Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))
# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()
with tf.Session() as sess:
# Load the weights and bias
saver.restore(sess, save_file)
# Show the values of weights and bias
print('Weight:')
print(sess.run(weights))
print('Bias:')
print(sess.run(bias))
Insert tf.reset_default_graph() after importing tensorflow.
I ran your code in 1.1.0, the results are the same...

What's the difference between tf.cond and if-else?

What difference between tf.cond and if-else?
Scenario 1
import tensorflow as tf
x = 'x'
y = tf.cond(tf.equal(x, 'x'), lambda: 1, lambda: 0)
with tf.Session() as sess:
print(sess.run(y))
x = 'y'
with tf.Session() as sess:
print(sess.run(y))
Scenario 2
import tensorflow as tf
x = tf.Variable('x')
y = tf.cond(tf.equal(x, 'x'), lambda: 1, lambda: 0)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
print(sess.run(y))
tf.assign(x, 'y')
with tf.Session() as sess:
init.run()
print(sess.run(y))
The outputs are both 1.
Does it mean only tf.placeholder can work, and not all the tensor, such as tf.variable? When should I choose if-else condition and when to use tf.cond? What are the diffences between them?
tf.cond is evaluated at the runtime, whereas if-else is evaluated at the graph construction time.
If you want to evaluate your condition depending on the value of the tensor at the runtime, tf.cond is the best option.
Did you mean if ... else in Python vs. tf.cond?
You can use if ... else for creating different graph for different external conditions. For example you can make one python script for graphs with 1, 2, 3 hidden layers, and use command line parameters for select which one use.
tf.cond is for add condition block to the graph. For example, you can define Huber function by code like this:
import tensorflow as tf
delta = tf.constant(1.)
x = tf.placeholder(tf.float32, shape=())
def left(x):
return tf.multiply(x, x) / 2.
def right(x):
return tf.multiply(delta, tf.abs(x) - delta / 2.)
hubber = tf.cond(tf.abs(x) <= delta, lambda: left(x), lambda: right(x))
and calculation in Graph will go by different branch for different input data.
sess = tf.Session()
with sess.as_default():
sess.run(tf.global_variables_initializer())
print(sess.run(hubber, feed_dict = {x: 0.5}))
print(sess.run(hubber, feed_dict = {x: 1.0}))
print(sess.run(hubber, feed_dict = {x: 2.0}))
> 0.125
> 0.5
> 1.5
Since the graph in TensorFlow is static, you cannot modify it once built. Thus you can use if-else outside of the graph at anytime for example while preparing batches and etc., but you can also employ it while constructing the graph. That is, if the condition doesn't depend on the value of any tensor, for example the dimention(having been set) of the tensor or the shape of any tensor. In such scenarios the graph will not be changed due to the condition while excuting the graph. The graph has been fixed after you finished drawing the graph and the if-else condition would not affect the graph while excuting the graph.
But if the condition depends on the value of the tensor in it that condition should be included in the graph and hence tf.cond should be applied.
Simply put: if else is how you do switch in Python, while tf.cond is how you do switch in Tensorflow. During running, if else is fixed in the compiled Python program, while tf.cond is fixed in the constructed Tensorflow graph.
You can think of tf.cond as the Tensorflow's internal way of doing if else.