Add validation summary - tensorflow

How can I add validation to tensorboard? I have written a wrappers for layers, like:
def convolution(input_data, kernel_shape, strides, activation, name=None):
with tf.name_scope(name):
kernel = tf.Variable(tf.truncated_normal(kernel_shape, stddev=stddev), name="weights")
bias = tf.Variable(tf.zeros([kernel_shape[-1]]), name="biases")
conv = tf.nn.conv2d(input=input_data, filter=kernel, strides=strides, padding="SAME", name="convolutions")
result = activation(tf.nn.bias_add(conv, bias), name="activations")
tf.scalar_summary(name + "/mean", tf.reduce_mean(kernel))
return result
and use summary_op = tf.merge_all_summaries() in main. Also I have implemented train_op and valid_op, which both calls inference function. However, there appears an error that we have duplicate tags for scalar_summary, i.e. inference is used in both train_op and valid_op, which lead to duplication of, say, conv1/mean summary.
How can I make this work? I need is to run train and validation using the same function inference.

As the error suggests, you cannot have two summaries with the same tag. This happens in your case because you are calling tf.scalar_summary twice with the same tag, once when constructing the train_op and once when constructing the valid_op. Here is a possible solution :
You can add a flag to your inference function, say is_training, to indicate that the code is being called to construct part of a training graph. You would have to thread that flag to all your layer functions. In convolution for instance, you should do the following :
if is_training:
tf.scalar_summary(name + "/mean", tf.reduce_mean(kernel))
return result
When constructing the train_op, you pass is_training=True, and when constructing the valid_op, you pass is_training=False. There is an example of such a programming pattern here in the Inception model.

Another way is to use different name scopes for summaries and then filter them by merge_summary's scope argument, instead of merge_all_summaries.

Related

How to evaluate the value of a tensor, from inside the model function of a custom tf.estimator

I am implementing an NLP model based on BERT, using tf.TPUEstimator(). I want to implement layer-wise training, where I need to select only one layer of the model to train for each epoch. In order to do this I wanted to change my model_fn and get the value of current_epoch.
I know how to compute the value of current_epoch as a tensor using tf.train.get_or_create_global_step() inside the model_fn BUT, I need to evaluate the value of this tensor to select which layer to train and implement return the correct train_op to the tf.estimator (train_op pertaining to a single layer chosen accrding to the value of the current_epoch).
I am unable to evaluate this tensor (current_epoch / global_step) from inside the model_fn. I tried the following but the training hangs at the step my_sess.run(my_global_step.initializer
global_step = tf.train.get_or_create_global_step()
graph = tf.get_default_graph()
my_sess = tf.Session(graph=graph)
current_epoch = (global_step * full_bs) // train_size
my_sess.run(my_global_step.initializer)
current_epoch = sess.run(current_epoch)
# My program hangs at the initialising step: my_sess.run(my_global_step.initializer)
Is there any way to evaluate a tensor using the tf.Estimators default session? How do I get the default session/ Graph?
Most importantly what is wrong in my code and why does the training hang when using tpu's and TPUEstimator?
This is not direct answer to OP's 2nd question, it is answer to the title.
I managed to print variable value with get_variable_value, but not sure if this is optimal way.
with
estimator = tf.contrib.tpu.TPUEstimator(
# ...
)
out = estimator.get_variable_value('output_bias')
print(type(out))
print(out)
I got
<class 'numpy.ndarray'>
[-0.00107745 0.00107744]

How to create two graphs for train and validation?

When I read tensorflow guidance about graph and session(Graphs and Sessions), I found they suggest to create two graphs for train and validation.
I think this reasonable and I want to use this because my train and validation models are different (for encoder-decoder mode or dropout). However, i don't know how to make variables in trained graph available for test graph without using tf.saver().
When I create two graphs and create variables inside each graph, I found these two variables are totally different as they belong to different graphs.
I have googled a lot and I know there are questions about this problems, such as question1. But there is still no useful answer. If there is any code example or anyone know how to create two graphs for train and validation separately, such as:
def train_model():
g_train = tf.graph()
with g_train.as_default():
train_models
def validation_model():
g_test = tf.graph()
with g_test.as_default():
test_models
One easy way of doing that is to create a 'forward function' that defines the model and change behaviour based on extra parameters.
Here is an example:
def forward_pass(x, is_training, reuse=tf.AUTO_REUSE, name='model_forward_pass'):
# Note the reuse attribute as it tells the getter to either create the graph or get the weights
with tf.variable_scope(name=name, reuse=reuse):
x = tf.layers.conv(x, ...)
...
x = tf.layers.dense(x, ...)
x = tf.layers.dropout(x, rate, training=is_training) # Note the is_training attribute
...
return x
Now you can call the 'forward_pass' function anywhere in your code. You simply need to provide the is_training attribute to use the correct mode for dropout for example. The 'reuse' argument will automatically get the correct values for your weights as long as the 'name' of the 'variable_scope' is the same.
For example:
train_logits_model1 = forward_pass(x_train, is_training=True, name='model1')
# Graph is defined and dropout is used in training mode
test_logits_model1 = forward_pass(x_test, is_training=False, name='model1')
# Graph is reused but the dropout behaviour change to inference mode
train_logits_model2 = forward_pass(x_train2, is_training=True, name='model2')
# Name changed, model2 is added to the graph and dropout is used in training mode
To add to this answer as you stated that you want to have 2 separated graph, you could to that using an assign function:
train_graph = forward_pass(x, is_training=True, reuse=False, name='train_graph')
...
test_graph = forward_pass(x, is_training=False, reuse=False, name='test_graph')
...
train_vars = tf.get_collection('variables', 'train_graph/.*')
test_vars = tf.get_collection('variables','test_graph/.*')
test_assign_ops = []
for test, train in zip(test_vars, train_vars):
test_assign_ops += [tf.assign(test, train)]
assign_op = tf.group(*test_assign_ops)
sess.run(assign_op) # Replace vars in the test_graph by the one in train_graph
I'm a big advocate of method 1 as it is way cleaner and reduce memory usage.

'Reuse' a single tensorflow model in different graphs?

Currently, I am working with a pretrained VGG model in Tf-Slim library. My motivation is to generate adversarial examples for a given image for this netowrk. The summary of task is:
x= tf.placeholder(shape=(None, 32, 32,3), dtype=tf.float32)
for i in range(2):
logits= vgg.vgg_16(x, is_training=False, spatial_squeeze=False, fc_conv_padding='SAME')
x = x + learning_rate*cal_gradient_of_logits_wrt_x(logits)
However, as soon we enter into the second iteration and start running logits= vgg.vgg16(....) we get the following error:
Variable vgg_16/conv1/conv1_1/weights already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
It is clear that this error occurred due to replication of graph in the second iteration. As the tf-slim model doesn't use reuse=True in the scopes, it throws this error (because in the second iteration we again ask it to add the vgg layers in the graph, which already exist).
Is it possible to somehow avoid this error? It should be possible to create the graph for VGG model once and use it whenever we need to calculate logits.
The reason this should be possible is the examples from keras. In keras we can simply define the model once with,
model= vgg.VGG16(*args, *kwargs)
Later on, we can add calculate logits for different tensor with,
logits_1= model(x1)
logits_2= model(x2)
Now in both these calculation, the same model parameters will be used i.e, no such error will appear. Is there a way to achieve the same functionality with a tensorflow model.
In tfslim/models/research/slim/nets/vgg.py:
add a reuse parameter in the vgg16 or vgg19 definition
def vgg_16(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.5,
spatial_squeeze=True,
scope='vgg_16',
fc_conv_padding='VALID',
global_pool=False,
reuse=None):
...
then set the variable scope to reuse the arg_scope if needed
with tf.variable_scope(scope, 'vgg_16', [inputs], reuse=reuse) as sc:
...
Then, when you invoke the function, pass the parameter as reuse=tf.AUTO_REUSE
vgg.vgg_16(images, num_classes=dataset.num_classes,
is_training=True, reuse=tf.AUTO_REUSE)
You can use tf.make_template(func) to do this.
x= tf.placeholder(shape=(None, 32, 32,3), dtype=tf.float32)
vgg_model = tf.make_template(vgg.vgg_16, is_training=False, spatial_squeeze=False, fc_conv_padding='SAME')
for i in range(2):
logits = vgg_model(x)
x += learning_rate*cal_gradient_of_logits_wrt_x(logits)

How to get weights in tf.layers.dense?

I wanna draw the weights of tf.layers.dense in tensorboard histogram, but it not show in the parameter, how could I do that?
The weights are added as a variable named kernel, so you could use
x = tf.dense(...)
weights = tf.get_default_graph().get_tensor_by_name(
os.path.split(x.name)[0] + '/kernel:0')
You can obviously replace tf.get_default_graph() by any other graph you are working in.
I came across this problem and just solved it. tf.layers.dense 's name is not necessary to be the same with the kernel's name's prefix. My tensor is "dense_2/xxx" but it's kernel is "dense_1/kernel:0". To ensure that tf.get_variable works, you'd better set the name=xxx in the tf.layers.dense function to make two names owning same prefix. It works as the demo below:
l=tf.layers.dense(input_tf_xxx,300,name='ip1')
with tf.variable_scope('ip1', reuse=True):
w = tf.get_variable('kernel')
By the way, my tf version is 1.3.
The latest tensorflow layers api creates all the variables using the tf.get_variable call. This ensures that if you wish to use the variable again, you can just use the tf.get_variable function and provide the name of the variable that you wish to obtain.
In the case of a tf.layers.dense, the variable is created as: layer_name/kernel. So, you can obtain the variable by saying:
with tf.variable_scope("layer_name", reuse=True):
weights = tf.get_variable("kernel") # do not specify
# the shape here or it will confuse tensorflow into creating a new one.
[Edit]: The new version of Tensorflow now has both Functional and Object-Oriented interfaces to the layers api. If you need the layers only for computational purposes, then using the functional api is a good choice. The function names start with small letters for instance -> tf.layers.dense(...). The Layer Objects can be created using capital first letters e.g. -> tf.layers.Dense(...). Once you have a handle to this layer object, you can use all of its functionality. For obtaining the weights, just use obj.trainable_weights this returns a list of all the trainable variables found in that layer's scope.
I am going crazy with tensorflow.
I run this:
sess.run(x.kernel)
after training, and I get the weights.
Comes from the properties described here.
I am saying that I am going crazy because it seems that there are a million slightly different ways to do something in tf, and that fragments the tutorials around.
Is there anything wrong with
model.get_weights()
After I create a model, compile it and run fit, this function returns a numpy array of the weights for me.
In TF 2 if you're inside a #tf.function (graph mode):
weights = optimizer.weights
If you're in eager mode (default in TF2 except in #tf.function decorated functions):
weights = optimizer.get_weights()
in TF2 weights will output a list in length 2
weights_out[0] = kernel weight
weights_out[1] = bias weight
the second layer weight (layer[0] is the input layer with no weights) in a model in size: 50 with input size: 784
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(50, activation="relu", name="dense_1")(inputs)
x = layers.Dense(50, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(...)
model.fit(...)
kernel_weight = model.layers[1].weights[0]
bias_weight = model.layers[1].weights[1]
all_weight = model.layers[1].weights
print(len(all_weight)) # 2
print(kernel_weight.shape) # (784,50)
print(bias_weight.shape) # (50,)
Try to make a loop for getting the weight of each layer in your sequential network by printing the name of the layer first which you can get from:
model.summary()
Then u can get the weight of each layer running this code:
for layer in model.layers:
print(layer.name)
print(layer.get_weights())

Tensorflow: train and test in separate functions

I am trying to use a Tensorflow model in two separate functions: one that trains it, and one used to test it. For example, the training function looks something like this:
graph = tf.Graph()
with graph.as_default():
tf_dataset = tf.placeholder(tf.float32, shape=(None, num_dims))
...
weights = tf.Variable(tf.truncated_normal([num_dims, num_labels]))
...
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
prediction = tf.nn.softmax(logits)
...
session = tf.Session(graph=graph)
...
The other, evaluation function would just use prediction with the test data, like so:
session.run(prediction, feed_dict={tf_dataset: test_data})
The problem is, of course, that tf_dataset is not in the scope of the other function. I am fine with returning session and prediction from the training function, but having to share every single placeholder with the evaluation code seems a bit lame.
Is there a way to get the references somehow, from the session or the graph? Also, are there any good practices on how to separate training and evaluation code in Tensorflow?
You could give your placeholders unique names and use that. IE,
tf_dataset = tf.placeholder(tf.float32, shape=(None, num_dims), name="datainput")
...
sess.run(..., feed_dict={"datainput:0": mydata})
You can also get names/type pairs for all ops in your graph, so you could recover all the placeholder tensor names that way
[(op.name+":0", op.op_def.name) for op in graph.get_operations()]